Merge branch 'gh-pages' into gh-pages

42e29e07 · chrisxu2014 · GitHub · 008bcfd2 · 35e6dbc4 · 42e29e07
25 changed file
--- a/404.html
+++ b/404.html
@@ -15,7 +15,6 @@
        .norsTitle {font-size: 22px; font-family: Microsoft Yahei; font-weight: normal; color: #333; margin: 35px 0 25px 0; }
    </style>
 </head>
-
 <body link="#0000cc">
    <div id="wrapper_wrapper">
        <div id="content_left">
@@ -32,4 +31,4 @@
    </div>
 </body>
 <script language="JavaScript"> function myrefresh(){window.location="/";}setTimeout('myrefresh()',5000);</script>
-</html>
+</html>
\ No newline at end of file
--- a/develop/doc/_sources/design/cluster_train/master_server.md.txt
+++ b/develop/doc/_sources/design/cluster_train/master_server.md.txt
@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c

 ## Task Queue

-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
+As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.

 ### Task Queue Creation

@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
 	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
 	}
 	```
-1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
+1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.

-	The definition of the block is:
+	The definition of the chunk is:
 	```go
-	type Block struct {
-		Idx   int // index of the block within the file
+	type Chunk struct {
+		Idx   int // index of the chunk within the file
 		Path  string
-		Index recordio.Index // block index
+		Index recordio.Index // chunk index
 	}
 	```
-1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
+1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.

 	The definition of the task is:
 	```go
 	type Task struct {
 		Index  int
-		Blocks []Block
+		Chunks []Chunk
 	}
 	```


--- a/develop/doc/_sources/design/cluster_train/pserver_client.md.txt
+++ b/develop/doc/_sources/design/cluster_train/pserver_client.md.txt
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
 ```c
 int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
 ```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
+The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:

 <img src="./src/pserver_init.png">

@@ -89,16 +89,13 @@ void paddle_pserver_client_release(paddle_pserver_client* client);
 *
 * paddle_begin_init_params will be called from multiple trainers,
 * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers will be blocked until the
- * initialization is done, and they need to get the initialized
+ * parameter servers. Other trainers need to get the initialized
 * parameters from parameter servers using @paddle_get_params.
 *
- * @param pserver_config_proto serialized parameter server configuration in
- * Protocol Buffers format.
 * @return 1 if the trainer is selected to initialize parameter
 * servers, otherwise 0.
 */
-int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_config_proto);
+int paddle_begin_init_params(paddle_pserver_client* client);

 /**
 * @brief paddle_init_param initializes the parameter on parameter
@@ -106,12 +103,13 @@ int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_
 *
 * @param param the parameter to initialize.
 * @param param_config_proto the configuration for the parameter.
+ * @param config_len the length of param_config_proto
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
 */
-int paddle_init_param(paddle_pserver_client* client, paddle_parameter params, const char* param_config_proto);
+int paddle_init_param(paddle_pserver_client* client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);

 /**
 * @brief paddle_finish_init_params tells parameter servers client has
@@ -138,6 +136,9 @@ int paddle_send_grads(paddle_pserver_client* client, const paddle_gradient* grad
 /**
 * @brief paddle_get_params gets parameters from parameter servers.
 *
+ * paddle_get_params will block until parameters are initialized on
+ * the parameter servers.
+ *
 * @param names the array of names of the parameters to get.
 * @param dst the destination array of parameters to save to.
 * @param len the length of the names array and the paddle_parameter

--- a/develop/doc/_sources/design/parameters_in_cpp.md.txt
+++ b/develop/doc/_sources/design/parameters_in_cpp.md.txt
+# Design Doc: The C++ Class `Parameters`
+
+`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
+
+We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
+* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
+* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
+
+It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
+
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
+It is evident that we should use `paddle::Parameter` when developing `Parameters`.
+However, the `Parameter` class contains many functions and does not have a clear interface.
+It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
+When we developing `Parameters`, we only use `create/store Parameter` functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
+
+2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
+We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
+Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
+`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. 
+So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+
+
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
+
+1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
+
+2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
+
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
+Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
+`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
+   * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
+   * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
+
+4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
+
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
--- a/develop/doc/api/v2/config/evaluators.html
+++ b/develop/doc/api/v2/config/evaluators.html
@@ -189,15 +189,15 @@
 <h2>Classification<a class="headerlink" href="#classification" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="classification-error">
 <h3>classification_error<a class="headerlink" href="#classification-error" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Classification Error Evaluator. It will print error rate for classification.</p>
 <p>The classification error is:</p>
 <div class="math">
 \[classification\_error = \frac{NumOfWrongPredicts}{NumOfAllSamples}\]</div>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span>  <span class="n">classification_error_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">prob</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span>  <span class="n">classification_evaluator</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">prob</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -228,12 +228,12 @@ important this sample is.</li>
 </div>
 <div class="section" id="auc">
 <h3>auc<a class="headerlink" href="#auc" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">auc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">auc</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Auc Evaluator which adapts to binary classification.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">auc_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">auc</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -256,12 +256,12 @@ important this sample is.</li>
 </div>
 <div class="section" id="ctc-error">
 <h3>ctc_error<a class="headerlink" href="#ctc-error" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">ctc_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">ctc_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This evaluator is to calculate sequence-to-sequence edit distance.</p>
 <p>The simple usage is :</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">ctc_error_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">ctc_evaluator</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -283,32 +283,68 @@ label for ctc</li>
 </div>
 <div class="section" id="chunk">
 <h3>chunk<a class="headerlink" href="#chunk" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a
-sequence. It calculates the chunk detection F1 score.</p>
-<p>A chunk is correctly detected if its beginning, end and type are correct.
-Other chunk type is ignored.</p>
-<p>For each label in the label sequence, we have:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tagType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">%</span> <span class="n">numTagType</span>
-<span class="n">chunkType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">/</span> <span class="n">numTagType</span>
-<span class="n">otherChunkType</span> <span class="o">=</span> <span class="n">numChunkTypes</span>
+sequence. It calculates precision, recall and F1 scores for the chunk detection.</p>
+<p>To use chunk evaluator, several concepts need to be clarified firstly.</p>
+<ul class="simple">
+<li><strong>Chunk type</strong> is the type of the whole chunk and a chunk consists of one or several words.  (For example in NER, ORG for organization name, PER for person name etc.)</li>
+<li><strong>Tag type</strong> indicates the position of a word in a chunk. (B for begin, I for inside, E for end, S for single)</li>
+</ul>
+<p>We can name a label by combining tag type and chunk type. (ie. B-ORG for begining of an organization name)</p>
+<p>The construction of label dictionary should obey the following rules:</p>
+<ul class="simple">
+<li>Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.</li>
+</ul>
+<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme    Description
+plain    Use the same label for the whole chunk.
+IOB      Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside.
+IOE      Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside.
+IOBES    Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk.
+</pre></div>
+</div>
+<p>To make it clear, let&#8217;s illustrate by an NER example.
+Assuming that there are three named entity types including ORG, PER and LOC which are called &#8216;chunk type&#8217; here,
+if &#8216;IOB&#8217; scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER, B-LOC, I-LOC and O,
+in which B-ORG for begining of ORG and I-ORG for inside of ORG.
+Prefixes which are called &#8216;tag type&#8217; here are added to chunk types and there are two tag types including B and I.
+Of course, the training data should be labeled accordingly.</p>
+<ul class="simple">
+<li>Mapping is done correctly by the listed equations and assigning protocol.</li>
+</ul>
+<p>The following table are equations to extract tag type and chunk type from a label.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>tagType = label % numTagType
+chunkType = label / numTagType
+otherChunkType = numChunkTypes
+</pre></div>
+</div>
+<p>The following table shows the mapping rule between tagType and tag type in each scheme.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Begin Inside End   Single
+plain  0     -      -     -
+IOB    0     1      -     -
+IOE    -     0      1     -
+IOBES  0     1      2     3
 </pre></div>
 </div>
-<p>The total number of different labels is numTagType*numChunkTypes+1.
-We support 4 labelling scheme.
-The tag type for each of the scheme is shown as follows:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Scheme</span> <span class="n">Begin</span> <span class="n">Inside</span> <span class="n">End</span>   <span class="n">Single</span>
-<span class="n">plain</span>  <span class="mi">0</span>     <span class="o">-</span>      <span class="o">-</span>     <span class="o">-</span>
-<span class="n">IOB</span>    <span class="mi">0</span>     <span class="mi">1</span>      <span class="o">-</span>     <span class="o">-</span>
-<span class="n">IOE</span>    <span class="o">-</span>     <span class="mi">0</span>      <span class="mi">1</span>     <span class="o">-</span>
-<span class="n">IOBES</span>  <span class="mi">0</span>     <span class="mi">1</span>      <span class="mi">2</span>     <span class="mi">3</span>
+<p>Continue the NER example, and the label dict should look like this to satify above equations:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>B-ORG  0
+I-ORG  1
+B-PER  2
+I-PER  3
+B-LOC  4
+I-LOC  5
+O      6
 </pre></div>
 </div>
-<p>&#8216;plain&#8217; means the whole chunk must contain exactly the same chunk label.</p>
+<p>In this example, chunkType has three values: 0 for ORG, 1 for PER, 2 for LOC, because the scheme is
+&#8220;IOB&#8221; so tagType has two values: 0 for B and 1 for I.
+Here we will use I-LOC to explain the above mapping rules in detail.
+For I-LOC, the label id is 5, so we can get tagType=1 and chunkType=2, which means I-LOC is a part of NER chunk LOC
+and the tag is I.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">chunk</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -333,9 +369,9 @@ The tag type for each of the scheme is shown as follows:</p>
 </div>
 <div class="section" id="precision-recall">
 <h3>precision_recall<a class="headerlink" href="#precision-recall" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">precision_recall</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">precision_recall</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>An Evaluator to calculate precision and recall, F1-score.
 It is adapt to the task with multiple labels.</p>
 <ul class="simple">
@@ -345,7 +381,7 @@ F1-score of all labels.</li>
 F1-score of this label.</li>
 </ul>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">precision_recall_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">precision_evaluator</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -372,13 +408,13 @@ F1-score of this label.</li>
 <h2>Rank<a class="headerlink" href="#rank" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="pnpair">
 <h3>pnpair<a class="headerlink" href="#pnpair" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">pnpair</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">pnpair</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Positive-negative pair rate Evaluator which adapts to rank task like
 learning to rank. This evaluator must contain at least three layers.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">pnpair_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">info</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">pnpair</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">info</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -405,12 +441,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 <h2>Utils<a class="headerlink" href="#utils" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="sum">
 <h3>sum<a class="headerlink" href="#sum" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>An Evaluator to sum the result of input.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">sum_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -432,12 +468,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 </div>
 <div class="section" id="column-sum">
 <h3>column_sum<a class="headerlink" href="#column-sum" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">column_sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">column_sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to sum the last column of input.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">column_sum_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">column_evaluator</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -460,12 +496,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 <h2>Print<a class="headerlink" href="#print" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="classification-error-printer">
 <h3>classification_error_printer<a class="headerlink" href="#classification-error-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the classification error of each sample.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">classification_error_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">classification_error_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -486,13 +522,13 @@ learning to rank. This evaluator must contain at least three layers.</p>
 </div>
 <div class="section" id="gradient-printer">
 <h3>gradient_printer<a class="headerlink" href="#gradient-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">gradient_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">gradient_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the gradient of input layers. It contains
 one or more input layers.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">gradient_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">gradient_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -512,14 +548,14 @@ one or more input layers.</p>
 </div>
 <div class="section" id="maxid-printer">
 <h3>maxid_printer<a class="headerlink" href="#maxid-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxid_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxid_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print maximum top k values and their indexes
 of each row of input layers. It contains one or more input layers.
 k is specified by num_results.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxid_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxid_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -541,9 +577,9 @@ It is 1 by default.</li>
 </div>
 <div class="section" id="maxframe-printer">
 <h3>maxframe_printer<a class="headerlink" href="#maxframe-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxframe_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxframe_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the top k frames of each input layers.
 The input layers should contain sequences info or sequences type.
 k is specified by num_results.
@@ -553,7 +589,7 @@ It contains one or more input layers.</p>
 <p class="last">The width of each frame is 1.</p>
 </div>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxframe_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxframe_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -573,9 +609,9 @@ It contains one or more input layers.</p>
 </div>
 <div class="section" id="seqtext-printer">
 <h3>seqtext_printer<a class="headerlink" href="#seqtext-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">seqtext_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">seqtext_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Sequence text printer will print text according to index matrix and a
 dictionary. There can be multiple input to this layer:</p>
 <p>1. If there is no id_input, the input must be a matrix containing
@@ -607,7 +643,7 @@ the sequence of indices;</p>
 <p>Typically SequenceTextPrinter layer takes output of maxid or RecurrentGroup
 with maxid (when generating) as an input.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">seqtext_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">maxid</span><span class="p">,</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">seqtext_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">maxid</span><span class="p">,</span>
                                 <span class="n">id_input</span><span class="o">=</span><span class="n">sample_id</span><span class="p">,</span>
                                 <span class="n">dict_file</span><span class="o">=</span><span class="n">dict_file</span><span class="p">,</span>
                                 <span class="n">result_file</span><span class="o">=</span><span class="n">result_file</span><span class="p">)</span>
@@ -647,13 +683,13 @@ Default is True. No space is added if set to False.</li>
 </div>
 <div class="section" id="value-printer">
 <h3>value_printer<a class="headerlink" href="#value-printer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">value_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">value_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the values of input layers. It contains
 one or more input layers.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">value_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">value_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">

--- a/develop/doc/api/v2/config/layer.html
+++ b/develop/doc/api/v2/config/layer.html
@@ -189,35 +189,10 @@
 <h2>Data layer<a class="headerlink" href="#data-layer" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="data">
 <span id="api-v2-layer-data"></span><h3>data<a class="headerlink" href="#data" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="attribute">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">data</code><span class="sig-paren">(</span><em>name</em>, <em>type</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define DataLayer For NeuralNetwork.</p>
-<p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;input&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">data_type</span><span class="o">.</span><span class="n">dense_vector</span><span class="p">(</span><span class="mi">1000</span><span class="p">))</span>
-</pre></div>
-</div>
-<table class="docutils field-list" frame="void" rules="none">
-<col class="field-name" />
-<col class="field-body" />
-<tbody valign="top">
-<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this data layer.</li>
-<li><strong>type</strong> &#8211; Data type of this data layer</li>
-<li><strong>height</strong> (<em>int|None</em>) &#8211; Height of this data layer, used for image</li>
-<li><strong>width</strong> (<em>int|None</em>) &#8211; Width of this data layer, used for image</li>
-<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-</ul>
-</td>
-</tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
-</td>
-</tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
-</td>
-</tr>
-</tbody>
-</table>
+<code class="descclassname">paddle.v2.layer.</code><code class="descname">data</code></dt>
+<dd><p>alias of <code class="xref py py-class docutils literal"><span class="pre">name</span></code></p>
 </dd></dl>

 </div>
@@ -228,12 +203,12 @@
 <span id="api-v2-layer-fc"></span><h3>fc<a class="headerlink" href="#fc" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">fc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">fc</code></dt>
 <dd><p>Helper for declare fully connected layer.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">fc</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
              <span class="n">size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span>
-              <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
+              <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
              <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -250,7 +225,7 @@
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer. Could be a list/tuple of input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -274,13 +249,13 @@ default Bias.</li>
 <h3>selective_fc<a class="headerlink" href="#selective-fc" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">selective_fc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">selective_fc</code></dt>
 <dd><p>Selectived fully connected layer. Different from fc, the output
 of this layer maybe sparse. It requires an additional input to indicate
 several selected columns for output. If the selected columns is not
 specified, selective_fc acts exactly like fc.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -294,7 +269,7 @@ specified, selective_fc acts exactly like fc.</p>
 sparse binary matrix, and treat as the mask of selective fc.
 If is None, acts exactly like fc.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -321,7 +296,7 @@ default Bias.</li>
 <h3>conv_operator<a class="headerlink" href="#conv-operator" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_operator</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_operator</code></dt>
 <dd><p>Different from img_conv, conv_op is an Operator, which can be used
 in mixed. And conv_op takes two inputs to perform convolution.
 The first input is the image and the second is filter kernel. It only
@@ -369,7 +344,7 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <h3>conv_projection<a class="headerlink" href="#conv-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_projection</code></dt>
 <dd><p>Different from img_conv and conv_op, conv_projection is an Projection,
 which can be used in mixed and conat. It use cudnn to implement
 conv and only support GPU mode.</p>
@@ -417,7 +392,7 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <h3>conv_shift<a class="headerlink" href="#conv-shift" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_shift</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_shift</code></dt>
 <dd><dl class="docutils">
 <dt>This layer performs cyclic convolution for two input. For example:</dt>
 <dd><ul class="first last simple">
@@ -470,7 +445,7 @@ the right size (which is the end of array) to the left.</li>
 <h3>img_conv<a class="headerlink" href="#img-conv" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_conv</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_conv</code></dt>
 <dd><p>Convolution layer for image. Paddle can support both square and non-square
 input currently.</p>
 <p>The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution</a> .</p>
@@ -494,7 +469,7 @@ rest channels will be processed by rest group of filters.</p>
                      <span class="n">num_channels</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span>
                      <span class="n">num_filters</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
                      <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
-                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
+                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -510,7 +485,7 @@ two image dimension.</li>
 currently supports rectangular filters, the filter&#8217;s
 shape will be (filter_size, filter_size_y).</li>
 <li><strong>num_filters</strong> &#8211; Each filter group&#8217;s number of filter</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type. Default is tanh</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type. Default is tanh</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; Group size of filters.</li>
 <li><strong>stride</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the stride. Or input a tuple for two image
 dimension.</li>
@@ -548,7 +523,7 @@ otherwise layer_type has to be either &#8220;exconv&#8221; or
 <span id="api-v2-layer-context-projection"></span><h3>context_projection<a class="headerlink" href="#context-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">context_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">context_projection</code></dt>
 <dd><p>Context Projection.</p>
 <p>It just simply reorganizes input sequence, combines &#8220;context_len&#8221; sequence
 to one context from context_start. &#8220;context_start&#8221; will be set to
@@ -591,7 +566,7 @@ parameter attribute is set by this parameter.</li>
 <h3>img_pool<a class="headerlink" href="#img-pool" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_pool</code></dt>
 <dd><p>Image pooling Layer.</p>
 <p>The details of pooling layer, please refer ufldl&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/Pooling/">pooling</a> .</p>
 <ul class="simple">
@@ -655,7 +630,7 @@ Defalut is True. If set false, Otherwise use floor.</li>
 <h3>spp<a class="headerlink" href="#spp" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">spp</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">spp</code></dt>
 <dd><p>Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
 The details please refer to
 <a class="reference external" href="https://arxiv.org/abs/1406.4729">Kaiming He&#8217;s paper</a>.</p>
@@ -695,7 +670,7 @@ The details please refer to
 <h3>maxout<a class="headerlink" href="#maxout" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">maxout</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">maxout</code></dt>
 <dd><dl class="docutils">
 <dt>A layer to do max out on conv layer output.</dt>
 <dd><ul class="first last simple">
@@ -752,7 +727,7 @@ automatically from previous output.</li>
 <h3>img_cmrnorm<a class="headerlink" href="#img-cmrnorm" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_cmrnorm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_cmrnorm</code></dt>
 <dd><p>Response normalization across feature maps.
 The details please refer to
 <a class="reference external" href="http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf">Alex&#8217;s paper</a>.</p>
@@ -791,7 +766,7 @@ num_channels is None, it will be set automatically.</li>
 <h3>batch_norm<a class="headerlink" href="#batch-norm" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">batch_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">batch_norm</code></dt>
 <dd><p>Batch Normalization Layer. The notation of this layer as follow.</p>
 <p><span class="math">\(x\)</span> is the input features over a mini-batch.</p>
 <div class="math">
@@ -805,7 +780,7 @@ y_i &amp;\gets \gamma \hat{x_i} + \beta \qquad &amp;//\ scale\ and\ shift\end{sp
 <p>The details of batch normalization please refer to this
 <a class="reference external" href="http://arxiv.org/abs/1502.03167">paper</a>.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">norm</span> <span class="o">=</span> <span class="n">batch_norm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">net</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">norm</span> <span class="o">=</span> <span class="n">batch_norm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">net</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -825,7 +800,7 @@ automaticly select cudnn_batch_norm for GPU and
 batch_norm for CPU. Otherwise, select batch norm
 type based on the specified type. If you use cudnn_batch_norm,
 we suggested you use latest version, such as v5.1.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Better be relu. Because batch
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Better be relu. Because batch
 normalization will normalize input near zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; num of image channels or previous layer&#8217;s number of
 filters. None will automatically get from layer&#8217;s
@@ -863,7 +838,7 @@ computation, referred to as facotr,
 <h3>sum_to_one_norm<a class="headerlink" href="#sum-to-one-norm" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_to_one_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_to_one_norm</code></dt>
 <dd><p>A layer for sum-to-one normalization,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -900,7 +875,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <h3>cross_channel_norm<a class="headerlink" href="#cross-channel-norm" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_channel_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_channel_norm</code></dt>
 <dd><p>Normalize a layer&#8217;s output. This layer is necessary for ssd.
 This layer applys normalize across the channels of each sample to
 a conv layer&#8217;s output and scale the output by a group of trainable
@@ -931,7 +906,7 @@ factors which dimensions equal to the channel&#8217;s number.</p>
 <h3>recurrent<a class="headerlink" href="#recurrent" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">recurrent</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">recurrent</code></dt>
 <dd><p>Simple recurrent unit layer. It is just a fully connect layer through both
 time and neural network.</p>
 <p>For each sequence [start, end] it performs the following computation:</p>
@@ -948,7 +923,7 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input Layer</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; bias attribute.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the layer</li>
@@ -971,7 +946,7 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <h3>lstmemory<a class="headerlink" href="#lstmemory" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstmemory</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstmemory</code></dt>
 <dd><p>Long Short-term Memory Cell.</p>
 <p>The memory cell was implemented as follow equations.</p>
 <div class="math">
@@ -995,9 +970,9 @@ more details about LSTM.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation type, paddle.v2.Activation.Tanh by default. <span class="math">\(h_t\)</span></li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; gate activation type, paddle.v2.Activation.Sigmoid by default.</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; state activation type, paddle.v2.Activation.Tanh by default.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.</li>
+<li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; state activation type, paddle.v2.activation.Tanh by default.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
 bias.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
@@ -1020,7 +995,7 @@ bias.</li>
 <h3>grumemory<a class="headerlink" href="#grumemory" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">grumemory</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">grumemory</code></dt>
 <dd><p>Gate Recurrent Unit Layer.</p>
 <p>The memory cell was implemented as follow equations.</p>
 <p>1. update gate <span class="math">\(z\)</span>: defines how much of the previous memory to
@@ -1060,9 +1035,9 @@ Recurrent Neural Networks on Sequence Modeling.</a></p>
 <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation type, paddle.v2.Activation.Tanh by default. This activation
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation
 affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; gate activation type, paddle.v2.Activation.Sigmoid by default.
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.
 This activation affects the <span class="math">\(z_t\)</span> and <span class="math">\(r_t\)</span>. It is the
 <span class="math">\(\sigma\)</span> in the above formula.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
@@ -1092,7 +1067,7 @@ will get a warning.</li>
 <h3>memory<a class="headerlink" href="#memory" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">memory</code><span class="sig-paren">(</span><em>name</em>, <em>extra_input=None</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">memory</code></dt>
 <dd><p>The memory layers is a layer cross each time step. Reference this output
 as previous time step layer <code class="code docutils literal"><span class="pre">name</span></code> &#8216;s output.</p>
 <p>The default memory is zero in first time step, previous time step&#8217;s
@@ -1101,12 +1076,12 @@ output in the rest time steps.</p>
 with activation.</p>
 <p>If boot_with_const_id, then the first time stop is a IndexSlot, the
 Arguments.ids()[0] is this <code class="code docutils literal"><span class="pre">cost_id</span></code>.</p>
-<p>If boot_layer is not null, the memory is just the boot_layer&#8217;s output.
+<p>If boot is not null, the memory is just the boot&#8217;s output.
 Set <code class="code docutils literal"><span class="pre">is_seq</span></code> is true boot layer is sequence.</p>
 <p>The same name layer in recurrent group will set memory on each time
 step.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">mem</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
-<span class="n">state</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
+<span class="n">state</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
 </pre></div>
 </div>
 <p>If you do not want to specify the name, you can equivalently use set_input()
@@ -1122,18 +1097,18 @@ name of the layer which this memory remembers.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; size of memory.</li>
 <li><strong>memory_name</strong> (<em>basestring</em>) &#8211; the name of the memory.
 It is ignored when name is provided.</li>
-<li><strong>is_seq</strong> (<em>bool</em>) &#8211; is sequence for boot_layer</li>
-<li><strong>boot_layer</strong> (<em>LayerOutput|None</em>) &#8211; boot layer of memory.</li>
-<li><strong>boot_bias</strong> (<em>ParameterAttribute|None</em>) &#8211; boot layer&#8217;s bias</li>
-<li><strong>boot_bias_active_type</strong> (<em>BaseActivation</em>) &#8211; boot layer&#8217;s active type.</li>
+<li><strong>is_seq</strong> (<em>bool</em>) &#8211; is sequence for boot</li>
+<li><strong>boot</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; boot layer of memory.</li>
+<li><strong>boot_bias</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; boot layer&#8217;s bias</li>
+<li><strong>boot_bias_active_type</strong> (<em>paddle.v2.activation.Base</em>) &#8211; boot layer&#8217;s active type.</li>
 <li><strong>boot_with_const_id</strong> (<em>int</em>) &#8211; boot layer&#8217;s id.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object which is a memory.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object which is a memory.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -1153,9 +1128,9 @@ sequence input. This is extremely usefull for attention based model, or
 Neural Turning Machine like models.</p>
 <p>The basic usage (time steps) is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
-    <span class="n">output</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
+    <span class="n">output</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
                      <span class="n">size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span>
-                      <span class="n">act</span><span class="o">=</span><span class="n">LinearActivation</span><span class="p">(),</span>
+                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
                      <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">output</span>

@@ -1165,8 +1140,8 @@ Neural Turning Machine like models.</p>
 </div>
 <p>You can see following configs for further usages:</p>
 <ul class="simple">
-<li>time steps: lstmemory_group, paddle/gserver/tests/sequence_layer_group.conf,                   demo/seqToseq/seqToseq_net.py</li>
-<li>sequence steps: paddle/gserver/tests/sequence_nest_layer_group.conf</li>
+<li>time steps: lstmemory_group, paddle/gserver/tests/sequence_group.conf,                   demo/seqToseq/seqToseq_net.py</li>
+<li>sequence steps: paddle/gserver/tests/sequence_nest_group.conf</li>
 </ul>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -1182,24 +1157,24 @@ a time step result. Then gather each time step of output into
 layer group&#8217;s output.</p>
 </li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; recurrent_group&#8217;s name.</li>
-<li><strong>input</strong> (<em>LayerOutput|StaticInput|SubsequenceInput|list|tuple</em>) &#8211; <p>Input links array.</p>
-<p>LayerOutput will be scattered into time steps.
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|StaticInput|SubsequenceInput|list|tuple</em>) &#8211; <p>Input links array.</p>
+<p>paddle.v2.config_base.Layer will be scattered into time steps.
 SubsequenceInput will be scattered into sequence steps.
 StaticInput will be imported to each time step, and doesn&#8217;t change
 through time. It&#8217;s a mechanism to access layer outside step function.</p>
 </li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; If reverse is set true, the recurrent unit will process the
 input sequence in a reverse order.</li>
-<li><strong>targetInlink</strong> (<em>LayerOutput|SubsequenceInput</em>) &#8211; <p>the input layer which share info with layer group&#8217;s output</p>
+<li><strong>targetInlink</strong> (<em>paddle.v2.config_base.Layer|SubsequenceInput</em>) &#8211; <p>the input layer which share info with layer group&#8217;s output</p>
 <p>Param input specifies multiple input layers. For
 SubsequenceInput inputs, config should assign one input
 layer that share info(the number of sentences and the number
 of words in each sentence) with all layer group&#8217;s outputs.
 targetInlink should be one of the layer group&#8217;s input.</p>
 </li>
-<li><strong>is_generating</strong> &#8211; If is generating, none of input type should be LayerOutput;
+<li><strong>is_generating</strong> &#8211; If is generating, none of input type should be paddle.v2.config_base.Layer;
 else, for training or testing, one of the input type must
-be LayerOutput.</li>
+be paddle.v2.config_base.Layer.</li>
 </ul>
 </td>
 </tr>
@@ -1210,9 +1185,9 @@ be LayerOutput.</li>
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
-<tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">LayerOutput object.</td>
+<tr class="field-odd field"><th class="field-name">Returns:</th><td class="field-body">paddle.v2.config_base.Layer object.</td>
 </tr>
-<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">LayerOutput</td>
+<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">paddle.v2.config_base.Layer</td>
 </tr>
 </tbody>
 </table>
@@ -1223,7 +1198,7 @@ be LayerOutput.</li>
 <h3>lstm_step<a class="headerlink" href="#lstm-step" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt>
 <dd><p>LSTM Step Layer. It used in recurrent_group. The lstm equations are shown
 as follow.</p>
 <div class="math">
@@ -1248,10 +1223,10 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a
 <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li>
 <li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type. Default is tanh</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Gate Activation Type. Default is sigmoid, and should
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type. Default is tanh</li>
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Gate Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
+<li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Bias Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
@@ -1273,7 +1248,7 @@ be sigmoid only.</li>
 <h3>gru_step<a class="headerlink" href="#gru-step" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">gru_step</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">gru_step</code></dt>
 <dd><table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -1314,7 +1289,7 @@ to maintain tractability.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rnn_step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
    <span class="n">last_time_step_output</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span>
-    <span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
+    <span class="k">with</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">last_time_step_output</span>
    <span class="k">return</span> <span class="n">simple_rnn</span>
@@ -1376,7 +1351,7 @@ beam size.</li>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The generated word index.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -1388,7 +1363,7 @@ beam size.</li>
 <h3>get_output<a class="headerlink" href="#get-output" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">get_output</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">get_output</code></dt>
 <dd><p>Get layer&#8217;s output by name. In PaddlePaddle, a layer might return multiple
 values, but returns one layer&#8217;s output. If the user wants to use another
 output besides the default one, please use get_output first to get
@@ -1429,17 +1404,17 @@ multiple outputs.</li>
 Each inputs is a projection or operator.</p>
 <p>There are two styles of usages.</p>
 <ol class="arabic simple">
-<li>When not set inputs parameter, use mixed_layer like this:</li>
+<li>When not set inputs parameter, use mixed like this:</li>
 </ol>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span> <span class="k">as</span> <span class="n">m</span><span class="p">:</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span> <span class="k">as</span> <span class="n">m</span><span class="p">:</span>
    <span class="n">m</span> <span class="o">+=</span> <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer1</span><span class="p">)</span>
    <span class="n">m</span> <span class="o">+=</span> <span class="n">identity_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer2</span><span class="p">)</span>
 </pre></div>
 </div>
 <ol class="arabic simple" start="2">
-<li>You can also set all inputs when invoke mixed_layer as follows:</li>
+<li>You can also set all inputs when invoke mixed as follows:</li>
 </ol>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">m</span> <span class="o">=</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">m</span> <span class="o">=</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer1</span><span class="p">),</span>
                       <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer2</span><span class="p">)])</span>
 </pre></div>
@@ -1453,11 +1428,11 @@ Each inputs is a projection or operator.</p>
 <li><strong>size</strong> (<em>int</em>) &#8211; layer size.</li>
 <li><strong>input</strong> &#8211; inputs layer. It is an optional parameter. If set,
 then this function will just return layer&#8217;s name.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation Type.</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
-something not type of ParameterAttribute. None will get a
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+something not type of paddle.v2.attr.ParameterAttribute. None will get a
 default Bias.</li>
-<li><strong>layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; The extra layer config. Default is None.</li>
+<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; The extra layer config. Default is None.</li>
 </ul>
 </td>
 </tr>
@@ -1476,7 +1451,7 @@ default Bias.</li>
 <span id="api-v2-layer-embedding"></span><h3>embedding<a class="headerlink" href="#embedding" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">embedding</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">embedding</code></dt>
 <dd><p>Define a embedding Layer.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -1507,7 +1482,7 @@ for details.</li>
 <h3>scaling_projection<a class="headerlink" href="#scaling-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling_projection</code></dt>
 <dd><p>scaling_projection multiplies the input with a scalar parameter and add to
 the output.</p>
 <div class="math">
@@ -1541,7 +1516,7 @@ the output.</p>
 <h3>dotmul_projection<a class="headerlink" href="#dotmul-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_projection</code></dt>
 <dd><p>DotMulProjection with a layer as input.
 It performs element-wise multiplication with weight.</p>
 <div class="math">
@@ -1576,7 +1551,7 @@ It performs element-wise multiplication with weight.</p>
 <h3>dotmul_operator<a class="headerlink" href="#dotmul-operator" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_operator</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_operator</code></dt>
 <dd><p>DotMulOperator takes two inputs and performs element-wise multiplication:</p>
 <div class="math">
 \[out.row[i] += scale * (a.row[i] .* b.row[i])\]</div>
@@ -1612,7 +1587,7 @@ scale is a config scalar, its default value is one.</p>
 <h3>full_matrix_projection<a class="headerlink" href="#full-matrix-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">full_matrix_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">full_matrix_projection</code></dt>
 <dd><p>Full Matrix Projection. It performs full matrix multiplication.</p>
 <div class="math">
 \[out.row[i] += in.row[i] * weight\]</div>
@@ -1658,7 +1633,7 @@ scale is a config scalar, its default value is one.</p>
 <h3>identity_projection<a class="headerlink" href="#identity-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">identity_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">identity_projection</code></dt>
 <dd><ol class="arabic simple">
 <li>IdentityProjection if offset=None. It performs:</li>
 </ol>
@@ -1704,7 +1679,7 @@ It select dimesions [offset, offset+layer_size) from input:</p>
 <h3>table_projection<a class="headerlink" href="#table-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">table_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">table_projection</code></dt>
 <dd><p>Table Projection. It selects rows from parameter where row_id
 is in input_ids.</p>
 <div class="math">
@@ -1753,7 +1728,7 @@ and <span class="math">\(i\)</span> is row_id.</p>
 <h3>trans_full_matrix_projection<a class="headerlink" href="#trans-full-matrix-projection" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans_full_matrix_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans_full_matrix_projection</code></dt>
 <dd><p>Different from full_matrix_projection, this projection performs matrix
 multiplication, using transpose of weight.</p>
 <div class="math">
@@ -1821,7 +1796,7 @@ sequence of a nested sequence, <code class="code docutils literal"><span class="
 <span id="id1"></span><h3>pooling<a class="headerlink" href="#api-v2-layer-pooling" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pooling</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pooling</code></dt>
 <dd><p>Pooling layer for sequence inputs, not used for Image.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">seq_pool</span> <span class="o">=</span> <span class="n">pooling</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
@@ -1860,7 +1835,7 @@ SumPooling, SquareRootNPooling.</li>
 <span id="api-v2-layer-last-seq"></span><h3>last_seq<a class="headerlink" href="#last-seq" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">last_seq</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">last_seq</code></dt>
 <dd><p>Get Last Timestamp Activation of a sequence.</p>
 <p>If stride &gt; 0, this layer slides a window whose size is determined by stride,
 and return the last value of the window as the output. Thus, a long sequence
@@ -1898,7 +1873,7 @@ of stride is -1.</p>
 <span id="api-v2-layer-first-seq"></span><h3>first_seq<a class="headerlink" href="#first-seq" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">first_seq</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">first_seq</code></dt>
 <dd><p>Get First Timestamp Activation of a sequence.</p>
 <p>If stride &gt; 0, this layer slides a window whose size is determined by stride,
 and return the first value of the window as the output. Thus, a long sequence
@@ -1936,7 +1911,7 @@ of stride is -1.</p>
 <h3>concat<a class="headerlink" href="#concat" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">concat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">concat</code></dt>
 <dd><p>Concat all input vector into one huge vector.
 Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <p>The example usage is:</p>
@@ -1950,7 +1925,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>list|tuple|collections.Sequence</em>) &#8211; input layers or projections</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -1970,7 +1945,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <h3>seq_concat<a class="headerlink" href="#seq-concat" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_concat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_concat</code></dt>
 <dd><p>Concat sequence a with sequence b.</p>
 <dl class="docutils">
 <dt>Inputs:</dt>
@@ -1994,7 +1969,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2020,7 +1995,7 @@ default Bias.</li>
 <h3>block_expand<a class="headerlink" href="#block-expand" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">block_expand</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">block_expand</code></dt>
 <dd><dl class="docutils">
 <dt>Expand feature map to minibatch matrix.</dt>
 <dd><ul class="first last simple">
@@ -2096,7 +2071,7 @@ sequence of a nested sequence, <code class="code docutils literal"><span class="
 <h3>expand<a class="headerlink" href="#expand" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">expand</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">expand</code></dt>
 <dd><p>A layer for &#8220;Expand Dense data or (sequence data where the length of each
 sequence is one) to sequence data.&#8221;</p>
 <p>The example usage is:</p>
@@ -2135,7 +2110,7 @@ bias.</li>
 <h3>repeat<a class="headerlink" href="#repeat" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">repeat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">repeat</code></dt>
 <dd><p>A layer for repeating the input for num_repeats times. This is equivalent
 to apply concat() with num_repeats same input.</p>
 <div class="math">
@@ -2171,7 +2146,7 @@ to apply concat() with num_repeats same input.</p>
 <h3>rotate<a class="headerlink" href="#rotate" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rotate</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rotate</code></dt>
 <dd><p>A layer for rotating 90 degrees (clock-wise) for each feature channel,
 usually used when the input sample is some image or feature map.</p>
 <div class="math">
@@ -2210,7 +2185,7 @@ usually used when the input sample is some image or feature map.</p>
 <h3>seq_reshape<a class="headerlink" href="#seq-reshape" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_reshape</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_reshape</code></dt>
 <dd><p>A layer for reshaping the sequence. Assume the input sequence has T instances,
 the dimension of each instance is M, and the input reshape_size is N, then the
 output sequence has T*M/N instances, the dimension of each instance is N.</p>
@@ -2227,7 +2202,7 @@ output sequence has T*M/N instances, the dimension of each instance is N.</p>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>reshape_size</strong> (<em>int</em>) &#8211; the size of reshaped sequence.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2253,7 +2228,7 @@ default Bias.</li>
 <h3>addto<a class="headerlink" href="#addto" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">addto</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">addto</code></dt>
 <dd><p>AddtoLayer.</p>
 <div class="math">
 \[y = f(\sum_{i} x_i + b)\]</div>
@@ -2261,7 +2236,7 @@ default Bias.</li>
 and <span class="math">\(f\)</span> is activation function.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">addto</span> <span class="o">=</span> <span class="n">addto</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">,</span> <span class="n">layer2</span><span class="p">],</span>
-                    <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">(),</span>
+                    <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">(),</span>
                    <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -2283,7 +2258,7 @@ Please refer to dropout for details.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; Input layers. It could be a paddle.v2.config_base.Layer or list/tuple of
 paddle.v2.config_base.Layer.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|bool</em>) &#8211; Bias attribute. If False, means no bias. None is default
 bias.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
@@ -2305,7 +2280,7 @@ bias.</li>
 <h3>linear_comb<a class="headerlink" href="#linear-comb" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">linear_comb</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">linear_comb</code></dt>
 <dd><dl class="docutils">
 <dt>A layer for weighted sum of vectors takes two inputs.</dt>
 <dd><ul class="first last simple">
@@ -2368,7 +2343,7 @@ processed in one batch.</p>
 <h3>interpolation<a class="headerlink" href="#interpolation" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">interpolation</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">interpolation</code></dt>
 <dd><p>This layer is for linear interpolation with two inputs,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -2407,7 +2382,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <h3>bilinear_interp<a class="headerlink" href="#bilinear-interp" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">bilinear_interp</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">bilinear_interp</code></dt>
 <dd><p>This layer is to implement bilinear interpolation on conv layer output.</p>
 <p>Please refer to Wikipedia: <a class="reference external" href="https://en.wikipedia.org/wiki/Bilinear_interpolation">https://en.wikipedia.org/wiki/Bilinear_interpolation</a></p>
 <p>The simple usage is:</p>
@@ -2442,7 +2417,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <h3>power<a class="headerlink" href="#power" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">power</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">power</code></dt>
 <dd><p>This layer applies a power function to a vector element-wise,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -2480,7 +2455,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 <h3>scaling<a class="headerlink" href="#scaling" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling</code></dt>
 <dd><p>A layer for multiplying input vector by weight scalar.</p>
 <div class="math">
 \[y  = w x\]</div>
@@ -2519,7 +2494,7 @@ processed in one batch.</p>
 <h3>slope_intercept<a class="headerlink" href="#slope-intercept" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">slope_intercept</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">slope_intercept</code></dt>
 <dd><p>This layer for applying a slope and an intercept to the input
 element-wise. There is no activation and weight.</p>
 <div class="math">
@@ -2556,7 +2531,7 @@ element-wise. There is no activation and weight.</p>
 <h3>tensor<a class="headerlink" href="#tensor" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">tensor</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">tensor</code></dt>
 <dd><p>This layer performs tensor operation for two input.
 For example, each sample:</p>
 <div class="math">
@@ -2585,7 +2560,7 @@ For example, each sample:</p>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>size</strong> (<em>int.</em>) &#8211; the layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2609,7 +2584,7 @@ default Bias.</li>
 <span id="api-v2-layer-cos-sim"></span><h3>cos_sim<a class="headerlink" href="#cos-sim" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cos_sim</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cos_sim</code></dt>
 <dd><p>Cosine Similarity Layer. The cosine similarity equation is here.</p>
 <div class="math">
 \[similarity = cos(\theta) = {\mathbf{a} \cdot \mathbf{b}
@@ -2652,7 +2627,7 @@ processed in one batch.</p>
 <h3>trans<a class="headerlink" href="#trans" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans</code></dt>
 <dd><p>A layer for transposing a minibatch matrix.</p>
 <div class="math">
 \[y = x^\mathrm{T}\]</div>
@@ -2690,7 +2665,7 @@ processed in one batch.</p>
 <h3>maxid<a class="headerlink" href="#maxid" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">max_id</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">max_id</code></dt>
 <dd><p>A layer for finding the id which has the maximal value for each sample.
 The result is stored in output.ids.</p>
 <p>The example usage is:</p>
@@ -2723,7 +2698,7 @@ The result is stored in output.ids.</p>
 <h3>sampling_id<a class="headerlink" href="#sampling-id" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sampling_id</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sampling_id</code></dt>
 <dd><p>A layer for sampling id from multinomial distribution from the input layer.
 Sampling one id for one sample.</p>
 <p>The simple usage is:</p>
@@ -2759,7 +2734,7 @@ Sampling one id for one sample.</p>
 <h3>pad<a class="headerlink" href="#pad" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pad</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pad</code></dt>
 <dd><p>This operation pads zeros to the input data according to pad_c,pad_h
 and pad_w. pad_c, pad_h, pad_w specifies the which dimension and size
 of padding. And the input data shape is NCHW.</p>
@@ -2828,7 +2803,7 @@ in width dimension.</p>
 <h3>cross_entropy_cost<a class="headerlink" href="#cross-entropy-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_cost</code></dt>
 <dd><p>A loss layer for multi class entropy.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">cross_entropy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                     <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2866,7 +2841,7 @@ will not be calculated for weight.</li>
 <h3>cross_entropy_with_selfnorm_cost<a class="headerlink" href="#cross-entropy-with-selfnorm-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_with_selfnorm_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_with_selfnorm_cost</code></dt>
 <dd><p>A loss layer for multi class entropy with selfnorm.
 Input should be a vector of positive numbers, without normalization.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">cross_entropy_with_selfnorm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
@@ -2902,7 +2877,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>multi_binary_label_cross_entropy_cost<a class="headerlink" href="#multi-binary-label-cross-entropy-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">multi_binary_label_cross_entropy_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">multi_binary_label_cross_entropy_cost</code></dt>
 <dd><p>A loss layer for multi binary label cross entropy.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">multi_binary_label_cross_entropy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                                        <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2936,7 +2911,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>huber_cost<a class="headerlink" href="#huber-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">huber_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">huber_cost</code></dt>
 <dd><p>A loss layer for huber loss.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">huber_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                  <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2970,7 +2945,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>lambda_cost<a class="headerlink" href="#lambda-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lambda_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lambda_cost</code></dt>
 <dd><p>lambdaCost for lambdaRank LTR approach.</p>
 <p>The simple usage:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">lambda_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
@@ -3016,7 +2991,7 @@ entire list of get gradient.</li>
 <h3>mse_cost<a class="headerlink" href="#mse-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">mse_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">mse_cost</code></dt>
 <dd><blockquote>
 <div><p>mean squared error cost:</p>
 <div class="math">
@@ -3069,7 +3044,7 @@ It is an optional argument.</td>
 <h3>rank_cost<a class="headerlink" href="#rank-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rank_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rank_cost</code></dt>
 <dd><p>A cost Layer for learning to rank using gradient descent. Details can refer
 to <a class="reference external" href="http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf">papers</a>.
 This layer contains at least three inputs. The weight is an optional
@@ -3124,7 +3099,7 @@ It is an optional argument.</li>
 <h3>sum_cost<a class="headerlink" href="#sum-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_cost</code></dt>
 <dd><p>A loss layer which calculate the sum of the input as loss</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">sum_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
@@ -3155,7 +3130,7 @@ It is an optional argument.</li>
 <h3>crf<a class="headerlink" href="#crf" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf</code></dt>
 <dd><p>A layer for calculating the cost of sequential conditional random
 field model.</p>
 <p>The simple usage:</p>
@@ -3196,7 +3171,7 @@ optional argument.</li>
 <h3>crf_decoding<a class="headerlink" href="#crf-decoding" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf_decoding</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf_decoding</code></dt>
 <dd><p>A layer for calculating the decoding sequence of sequential conditional
 random field model. The decoding sequence is stored in output.ids.
 If a second input is provided, it is treated as the ground-truth label, and
@@ -3236,7 +3211,7 @@ decoding or 0 for correct decoding.</p>
 <h3>ctc<a class="headerlink" href="#ctc" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">ctc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">ctc</code></dt>
 <dd><p>Connectionist Temporal Classification (CTC) is designed for temporal
 classication task. That is, for sequence labeling problems where the
 alignment between the inputs and the target labels is unknown.</p>
@@ -3287,7 +3262,7 @@ should also be num_classes + 1.</p>
 <h3>warp_ctc<a class="headerlink" href="#warp-ctc" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">warp_ctc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">warp_ctc</code></dt>
 <dd><p>A layer intergrating the open-source <cite>warp-ctc
 &lt;https://github.com/baidu-research/warp-ctc&gt;</cite> library, which is used in
 <cite>Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin
@@ -3347,7 +3322,7 @@ should be consistent as that used in your labels.</li>
 <h3>nce<a class="headerlink" href="#nce" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">nce</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">nce</code></dt>
 <dd><p>Noise-contrastive estimation.
 Implements the method in the following paper:
 A fast and simple algorithm for training neural probabilistic language models.</p>
@@ -3367,7 +3342,7 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; label layer</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; weight layer, can be None(default)</li>
 <li><strong>num_classes</strong> (<em>int</em>) &#8211; number of classes.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation, default is Sigmoid.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation, default is Sigmoid.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 <li><strong>num_neg_samples</strong> (<em>int</em>) &#8211; number of negative samples. Default is 10.</li>
 <li><strong>neg_distribution</strong> (<em>list|tuple|collections.Sequence|None</em>) &#8211; The distribution for generating the random negative labels.
@@ -3393,7 +3368,7 @@ If not None, its length must be equal to num_classes.</li>
 <h3>hsigmoid<a class="headerlink" href="#hsigmoid" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">hsigmoid</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">hsigmoid</code></dt>
 <dd><p>Organize the classes into a binary tree. At each node, a sigmoid function
 is used to calculate the probability of belonging to the right branch.
 This idea is from &#8220;F. Morin, Y. Bengio (AISTATS 05):
@@ -3435,7 +3410,7 @@ False means no bias.</li>
 <h3>smooth_l1_cost<a class="headerlink" href="#smooth-l1-cost" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">smooth_l1_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">smooth_l1_cost</code></dt>
 <dd><p>This is a L1 loss but more smooth. It requires that the
 size of input and label are equal. The formula is as follows,</p>
 <div class="math">
@@ -3479,7 +3454,7 @@ size of input and label are equal. The formula is as follows,</p>
 <h3>eos<a class="headerlink" href="#eos" title="Permalink to this headline">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">eos</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">eos</code></dt>
 <dd><p>A layer for checking EOS for each sample:
 - output_id = (input_id == conf.eos_id)</p>
 <p>The result is stored in output_.ids.

--- a/develop/doc/api/v2/config/networks.html
+++ b/develop/doc/api/v2/config/networks.html
@@ -190,9 +190,9 @@
 <h2>NLP<a class="headerlink" href="#nlp" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="sequence-conv-pool">
 <h3>sequence_conv_pool<a class="headerlink" href="#sequence-conv-pool" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Text convolution pooling layers helper.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -201,34 +201,34 @@
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
 <li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling&#8217;s document.</li>
-<li><strong>context_proj_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
+<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
 None if user don&#8217;t care.</li>
-<li><strong>fc_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
 False if no bias.</li>
-<li><strong>fc_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; fc layer extra attribute.</li>
-<li><strong>context_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; context projection layer extra attribute.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; pooling layer extra attribute.</li>
+<li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
+<li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
+<li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -238,9 +238,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="text-conv-pool">
 <span id="api-trainer-config-helpers-network-text-conv-pool"></span><h3>text_conv_pool<a class="headerlink" href="#text-conv-pool" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Text convolution pooling layers helper.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -249,34 +249,34 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
 <li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling&#8217;s document.</li>
-<li><strong>context_proj_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
+<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
 None if user don&#8217;t care.</li>
-<li><strong>fc_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
 False if no bias.</li>
-<li><strong>fc_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; fc layer extra attribute.</li>
-<li><strong>context_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; context projection layer extra attribute.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; pooling layer extra attribute.</li>
+<li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
+<li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
+<li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -289,9 +289,9 @@ False if no bias.</li>
 <h2>Images<a class="headerlink" href="#images" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="img-conv-bn-pool">
 <h3>img_conv_bn_pool<a class="headerlink" href="#img-conv-bn-pool" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -299,33 +299,33 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_attr</strong> (<em>Extrapaddle.v2.config_base.Layer</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>bn_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute.</em>) &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>bn_attr</strong> &#8211; paddle.v2.attr.ParameterAttribute.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_pool&#8217;s document.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer groups output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -335,9 +335,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="img-conv-group">
 <h3>img_conv_group<a class="headerlink" href="#img-conv-group" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_group</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Image Convolution Group, Used for vgg net.</p>
 <p>TODO(yuyang18): Complete docs</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -369,9 +369,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="simple-img-conv-pool">
 <span id="api-trainer-config-helpers-network-simple-img-conv-pool"></span><h3>simple_img_conv_pool<a class="headerlink" href="#simple-img-conv-pool" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
 <p>Input =&gt; conv =&gt; pooling</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -380,30 +380,30 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool for details</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; see img_conv for details</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_pool for details</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -416,9 +416,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="vgg-16-network">
 <h3>vgg_16_network<a class="headerlink" href="#vgg-16-network" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">vgg_16_network</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">vgg_16_network</code><span class="sig-paren">(</span><em>input_image</em>, <em>num_channels</em>, <em>num_classes=1000</em><span class="sig-paren">)</span></dt>
 <dd><p>Same model from <a class="reference external" href="https://gist.github.com/ksimonyan/211839e770f7b538e2d8">https://gist.github.com/ksimonyan/211839e770f7b538e2d8</a></p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -426,7 +426,7 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>num_classes</strong> &#8211; </li>
-<li><strong>input_image</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; </li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
 </ul>
 </td>
@@ -446,9 +446,9 @@ False if no bias.</li>
 <h3>LSTM<a class="headerlink" href="#lstm" title="Permalink to this headline">¶</a></h3>
 <div class="section" id="lstmemory-unit">
 <h4>lstmemory_unit<a class="headerlink" href="#lstmemory-unit" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Define calculations that a LSTM unit performs in a single time step.
 This function itself is not a recurrent layer, so that it can not be
 directly applied to sequence input. This function is always used in
@@ -462,9 +462,9 @@ for more details about LSTM. The link goes as follows:
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                           <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                           <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                           <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">(),</span>
-                           <span class="n">state_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+                           <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                           <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                           <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -472,27 +472,27 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
-<li><strong>get_output_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>get_output_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstmemory unit name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -502,9 +502,9 @@ False means no bias, None means default bias.</li>
 </div>
 <div class="section" id="lstmemory-group">
 <h4>lstmemory_group<a class="headerlink" href="#lstmemory-group" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>lstm_group is a recurrent layer group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
@@ -517,14 +517,14 @@ lstmemory_group.</p>
 multiplications:
 <span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>,
 <span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in lstmemory_unit to
-speed up the calculations. Consequently, an additional mixed with
+speed up the calculations. Consequently, an additional mixed_layer with
 full_matrix_projection must be included before lstmemory_unit is called.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                            <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                            <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                            <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">(),</span>
-                            <span class="n">state_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+                            <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                            <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                            <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -532,28 +532,28 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory group name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
-<li><strong>get_output_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>get_output_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the lstmemory group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -563,9 +563,9 @@ False means no bias, None means default bias.</li>
 </div>
 <div class="section" id="simple-lstm">
 <h4>simple_lstm<a class="headerlink" href="#simple-lstm" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
 <p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
 layer. The simple lstm cell was implemented as follow equations.</p>
@@ -579,25 +579,25 @@ want to know what lstm is. <a class="reference external" href="http://arxiv.org/
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>mat_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
-<li><strong>bias_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_cell_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstm layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -607,9 +607,9 @@ means default bias.</li>
 </div>
 <div class="section" id="bidirectional-lstm">
 <h4>bidirectional_lstm<a class="headerlink" href="#bidirectional-lstm" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
 sequence both in forward and bardward orders, and then concatenate two
 outputs form a final output. However, concatenation of two outputs
@@ -629,7 +629,7 @@ The link goes as follows:
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
 <li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
 concatenated and returned.
@@ -639,10 +639,10 @@ concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -655,9 +655,9 @@ concatenated and returned.</li>
 <h3>GRU<a class="headerlink" href="#gru" title="Permalink to this headline">¶</a></h3>
 <div class="section" id="gru-unit">
 <h4>gru_unit<a class="headerlink" href="#gru-unit" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Define calculations that a gated recurrent unit performs in a single time
 step. This function itself is not a recurrent layer, so that it can not be
 directly applied to sequence input. This function is almost always used in
@@ -669,19 +669,19 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activation</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru output layer.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -691,9 +691,9 @@ mechanism.</p>
 </div>
 <div class="section" id="gru-group">
 <h4>gru_group<a class="headerlink" href="#gru-group" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>gru_group is a recurrent layer group version of Gated Recurrent Unit. It
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
@@ -704,8 +704,8 @@ to use the grumemory, which is relatively faster.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">())</span>
+                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -713,21 +713,21 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -737,23 +737,23 @@ to use the grumemory, which is relatively faster.</p>
 </div>
 <div class="section" id="simple-gru">
 <h4>simple_gru<a class="headerlink" href="#simple-gru" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step, grumemory in layers.py, gru_unit, gru_group,
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
 series of layers to compute rnn step by step. This way is flexible for
 attenion mechanism or other complex connections.</p>
 <ul class="simple">
-<li>gru_step: only compute rnn by one step. It needs an memory as input
+<li>gru_step_layer: only compute rnn by one step. It needs an memory as input
 and can be used in recurrent group.</li>
-<li>gru_unit: a wrapper of gru_step with memory.</li>
+<li>gru_unit: a wrapper of gru_step_layer with memory.</li>
 <li>gru_group: a GRU cell implemented by a combination of multiple layers in
 recurrent group.
 But <span class="math">\(W x_t\)</span> is not done in group.</li>
@@ -774,21 +774,21 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -798,9 +798,9 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 </div>
 <div class="section" id="simple-gru2">
 <h4>simple_gru2<a class="headerlink" href="#simple-gru2" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
 Please see grumemory in layers.py for more detail about the maths.
 simple_gru2 is faster than simple_gru.</p>
@@ -813,21 +813,21 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -837,9 +837,9 @@ simple_gru2 is faster than simple_gru.</p>
 </div>
 <div class="section" id="bidirectional-gru">
 <h4>bidirectional_gru<a class="headerlink" href="#bidirectional-gru" title="Permalink to this headline">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
 sequence both in forward and bardward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
@@ -855,7 +855,7 @@ just add them together.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
 <li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
 concatenated and returned.
@@ -865,10 +865,10 @@ concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -879,9 +879,9 @@ concatenated and returned.</li>
 </div>
 <div class="section" id="simple-attention">
 <h3>simple_attention<a class="headerlink" href="#simple-attention" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Calculate and then return a context vector by attention machanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
@@ -905,18 +905,18 @@ Align and Translate</strong> for more details. The link is as follows:
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
-<li><strong>softmax_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
+<li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
 that is used to produce attention weight</li>
 <li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
-<li><strong>encoded_sequence</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; output of the encoder</li>
-<li><strong>encoded_proj</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; attention weight is computed by a feed forward neural
+<li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
+<li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state
 of previous time step and encoder&#8217;s output.
 encoded_proj is output of the feed-forward network for
 encoder&#8217;s output. Here we pre-compute it outside
 simple_attention for speed consideration.</li>
-<li><strong>decoder_state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; hidden state of decoder in previous time step</li>
-<li><strong>transform_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute of the feed-forward
+<li><strong>decoder_state</strong> (<em>LayerOutput</em>) &#8211; hidden state of decoder in previous time step</li>
+<li><strong>transform_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of the feed-forward
 network that takes decoder_state as inputs to
 compute attention weight.</li>
 </ul>
@@ -935,9 +935,9 @@ compute attention weight.</li>
 <h2>Miscs<a class="headerlink" href="#miscs" title="Permalink to this headline">¶</a></h2>
 <div class="section" id="dropout-layer">
 <h3>dropout_layer<a class="headerlink" href="#dropout-layer" title="Permalink to this headline">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">dropout_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">dropout_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>&#64;TODO(yuyang18): Add comments.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />

--- a/develop/doc/api/v2/data.html
+++ b/develop/doc/api/v2/data.html
@@ -185,12 +185,50 @@
 <h1>Data Reader Interface and DataSets<a class="headerlink" href="#data-reader-interface-and-datasets" title="Permalink to this headline">¶</a></h1>
 <div class="section" id="datatypes">
 <h2>DataTypes<a class="headerlink" href="#datatypes" title="Permalink to this headline">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_array</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>dim</strong> (<em>int</em>) &#8211; dimension of this vector.</li>
+<li><strong>seq_type</strong> (<em>int</em>) &#8211; sequence type of input.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">An input type object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">InputType</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
+
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_vector</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
-<dd><p>Dense Vector. It means the input feature is dense float vector. For example,
-if the input is an image with 28*28 pixels, the input of Paddle neural
-network should be a dense vector with dimension 784.</p>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />

--- a/develop/doc/design/cluster_train/master_server.html
+++ b/develop/doc/design/cluster_train/master_server.html
@@ -186,7 +186,7 @@
 </div>
 <div class="section" id="task-queue">
 <span id="task-queue"></span><h2>Task Queue<a class="headerlink" href="#task-queue" title="Permalink to this headline">¶</a></h2>
-<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
+<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
 <div class="section" id="task-queue-creation">
 <span id="task-queue-creation"></span><h3>Task Queue Creation<a class="headerlink" href="#task-queue-creation" title="Permalink to this headline">¶</a></h3>
 <ol>
@@ -197,21 +197,21 @@
 </pre></div>
 </div>
 </li>
-<li><p class="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
-<p>The definition of the block is:</p>
-<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Block</span> <span class="kd">struct</span> <span class="p">{</span>
-    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the block within the file</span>
+<li><p class="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
+<p>The definition of the chunk is:</p>
+<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Chunk</span> <span class="kd">struct</span> <span class="p">{</span>
+    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the chunk within the file</span>
    <span class="nx">Path</span>  <span class="kt">string</span>
-    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// block index</span>
+    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// chunk index</span>
 <span class="p">}</span>
 </pre></div>
 </div>
 </li>
-<li><p class="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
+<li><p class="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
 <p>The definition of the task is:</p>
 <div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Task</span> <span class="kd">struct</span> <span class="p">{</span>
    <span class="nx">Index</span>  <span class="kt">int</span>
-    <span class="nx">Blocks</span> <span class="p">[]</span><span class="nx">Block</span>
+    <span class="nx">Chunks</span> <span class="p">[]</span><span class="nx">Chunk</span>
 <span class="p">}</span>
 </pre></div>
 </div>

--- a/develop/doc/design/cluster_train/pserver_client.html
+++ b/develop/doc/design/cluster_train/pserver_client.html
@@ -226,7 +226,7 @@ name:sparse-n-1
 <div class="highlight-c"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">config_proto</span><span class="p">);</span>
 </pre></div>
 </div>
-<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
+<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return 0. <code class="docutils literal"><span class="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
 <p><img src="./src/pserver_init.png"></p>
 </div>
 </div>
@@ -259,16 +259,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
 <span class="cm"> * only one trainer will be selected to initialize the parameters on</span>
-<span class="cm"> * parameter servers. Other trainers will be blocked until the</span>
-<span class="cm"> * initialization is done, and they need to get the initialized</span>
+<span class="cm"> * parameter servers. Other trainers need to get the initialized</span>
 <span class="cm"> * parameters from parameter servers using @paddle_get_params.</span>
 <span class="cm"> *</span>
-<span class="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
-<span class="cm"> * Protocol Buffers format.</span>
 <span class="cm"> * @return 1 if the trainer is selected to initialize parameter</span>
 <span class="cm"> * servers, otherwise 0.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pserver_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">);</span>

 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_init_param initializes the parameter on parameter</span>
@@ -276,12 +273,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * @param param the parameter to initialize.</span>
 <span class="cm"> * @param param_config_proto the configuration for the parameter.</span>
+<span class="cm"> * @param config_len the length of param_config_proto</span>
 <span class="cm"> * @return 0 if successful, otherwise -1. On failure, the trainer</span>
 <span class="cm"> * needs to restart the entire initialization process (starting from</span>
 <span class="cm"> * @paddle_begin_init_param). Or simply exit the program and wait for</span>
 <span class="cm"> * the cluster management system to restart the trainer.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">params</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">param</span><span class="p">,</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">,</span> <span class="kt">int</span> <span class="n">config_len</span><span class="p">);</span>

 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_finish_init_params tells parameter servers client has</span>
@@ -308,6 +306,9 @@ name:sparse-n-1
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_get_params gets parameters from parameter servers.</span>
 <span class="cm"> *</span>
+<span class="cm"> * paddle_get_params will block until parameters are initialized on</span>
+<span class="cm"> * the parameter servers.</span>
+<span class="cm"> *</span>
 <span class="cm"> * @param names the array of names of the parameters to get.</span>
 <span class="cm"> * @param dst the destination array of parameters to save to.</span>
 <span class="cm"> * @param len the length of the names array and the paddle_parameter</span>

--- a/develop/doc/design/parameters_in_cpp.html
+++ b/develop/doc/design/parameters_in_cpp.html
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Design Doc: The C++ Class Parameters &mdash; PaddlePaddle  documentation</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+  
+        <link rel="index" title="Index"
+              href="../genindex.html"/>
+        <link rel="search" title="Search" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  documentation" href="../index.html"/> 
+
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+
+  
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Folk me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../about/index_en.html">ABOUT</a></li>
+</ul>
+
+        
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  
+  <div class="main-content-wrap">
+
+    
+    <nav class="doc-menu-vertical" role="navigation">
+        
+          
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/ubuntu_install_en.html">Debian Package installation guide</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/build_from_source_en.html">Installing from Sources</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_en.html">Set Command-line Parameters</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_en.html">Use Case</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_en.html">Argument Outline</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_en.html">Detail Description</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_en.html">Run Distributed Training</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_en.html">Contribute Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_en.html">RNN Models</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_en.html">Tune GPU Performance</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">Model Configuration</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">Data Reader Interface and DataSets</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">Training and Inference</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../about/index_en.html">ABOUT</a></li>
+</ul>
+
+        
+    </nav>
+    
+    <section class="doc-content-wrap">
+
+      
+
+ 
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+      
+    <li>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code></li>
+  </ul>
+</div>
+      
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="design-doc-the-c-class-parameters">
+<span id="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code><a class="headerlink" href="#design-doc-the-c-class-parameters" title="Permalink to this headline">¶</a></h1>
+<p><code class="docutils literal"><span class="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <code class="docutils literal"><span class="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <code class="docutils literal"><span class="pre">Parameter</span></code> in <a class="reference internal" href="api.html"><span class="doc">api.md</span></a>.</p>
+<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
+<ul class="simple">
+<li>We just use <code class="docutils literal"><span class="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
+<li>We did not implement share Parameters while training. We just trigger <code class="docutils literal"><span class="pre">memcpy</span></code> when start training.</li>
+</ul>
+<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <code class="docutils literal"><span class="pre">Parameters</span></code>:</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">paddle::Parameter</span></code>. A <code class="docutils literal"><span class="pre">Parameters</span></code> is a container for <code class="docutils literal"><span class="pre">paddle::Parameter</span></code>.
+It is evident that we should use <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> when developing <code class="docutils literal"><span class="pre">Parameters</span></code>.
+However, the <code class="docutils literal"><span class="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
+It contains <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code>, <code class="docutils literal"><span class="pre">serialize/deserialize</span></code>, <code class="docutils literal"><span class="pre">optimize(i.e</span> <span class="pre">SGD)</span></code>, <code class="docutils literal"><span class="pre">randomize/zero</span></code>.
+When we developing <code class="docutils literal"><span class="pre">Parameters</span></code>, we only use <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code> functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
+<li><code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <code class="docutils literal"><span class="pre">paddle::MultiGradientMachine</span></code>, <code class="docutils literal"><span class="pre">paddle::NeuralNetwork</span></code>.
+We should pass <code class="docutils literal"><span class="pre">Parameters</span></code> to <code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> when <code class="docutils literal"><span class="pre">forward/backward</span></code> to avoid <code class="docutils literal"><span class="pre">memcpy</span></code> between topologies.
+Also, we should handle multi-GPU/CPU training, because <code class="docutils literal"><span class="pre">forward</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
+<code class="docutils literal"><span class="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
+<li><code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
+So <code class="docutils literal"><span class="pre">Parameters</span></code> should be used by <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>, and <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code> should optimize <code class="docutils literal"><span class="pre">Parameters</span></code> (by SGD).</li>
+</ol>
+<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
+<ol class="simple">
+<li>Clean <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
+<li>Implementation a <code class="docutils literal"><span class="pre">Parameters</span></code> class. It just stores the <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> inside. Make <code class="docutils literal"><span class="pre">GradientMachine</span></code> uses <code class="docutils literal"><span class="pre">Parameters</span></code> as a class member.</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <code class="docutils literal"><span class="pre">Parameter</span></code> between topologies.
+Because we need share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies, it is <code class="docutils literal"><span class="pre">Parameters</span></code>&#8216;s response to exchange Parameters between GPUs.
+<code class="docutils literal"><span class="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <code class="docutils literal"><span class="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <code class="docutils literal"><span class="pre">Parameters</span></code>.<ul>
+<li>We should use a global function to exchange Parameters between GPUs, not a member function in <code class="docutils literal"><span class="pre">Parameters</span></code>. The <code class="docutils literal"><span class="pre">MultiGradientMachine</span></code> invoke this function, which uses <code class="docutils literal"><span class="pre">Parameters</span></code> as this function inputs.</li>
+<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
+</ul>
+</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> as an argument for <code class="docutils literal"><span class="pre">forward/backward</span></code> function, not a data member for <code class="docutils literal"><span class="pre">GradientMachine</span></code>. For example, <code class="docutils literal"><span class="pre">forward</span></code> could be <code class="docutils literal"><span class="pre">forward(const</span> <span class="pre">Parameters&amp;</span> <span class="pre">params,</span> <span class="pre">...)</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> could be <code class="docutils literal"><span class="pre">backward(Parameters*</span> <span class="pre">params,</span> <span class="pre">...)</span></code>. After this step, Paddle could share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies.</li>
+<li><code class="docutils literal"><span class="pre">ParameterUpdater</span></code> is invoked by <code class="docutils literal"><span class="pre">GradientMachine</span></code> and <code class="docutils literal"><span class="pre">Trainer</span></code>, but it updates <code class="docutils literal"><span class="pre">Parameters</span></code>. In the end of this code refactoring, we could change <code class="docutils literal"><span class="pre">ParameterUpdater</span></code> directly uses <code class="docutils literal"><span class="pre">Parameters</span></code> to make <code class="docutils literal"><span class="pre">ParameterUpdater</span></code>&#8216;s implementation clear.</li>
+</ol>
+</div>
+
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+       
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+  
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt
+++ b/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt
@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c

 ## Task Queue

-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
+As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.

 ### Task Queue Creation

@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
 	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
 	}
 	```
-1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
+1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.

-	The definition of the block is:
+	The definition of the chunk is:
 	```go
-	type Block struct {
-		Idx   int // index of the block within the file
+	type Chunk struct {
+		Idx   int // index of the chunk within the file
 		Path  string
-		Index recordio.Index // block index
+		Index recordio.Index // chunk index
 	}
 	```
-1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
+1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.

 	The definition of the task is:
 	```go
 	type Task struct {
 		Index  int
-		Blocks []Block
+		Chunks []Chunk
 	}
 	```


--- a/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt
+++ b/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
 ```c
 int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
 ```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
+The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:

 <img src="./src/pserver_init.png">

@@ -89,16 +89,13 @@ void paddle_pserver_client_release(paddle_pserver_client* client);
 *
 * paddle_begin_init_params will be called from multiple trainers,
 * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers will be blocked until the
- * initialization is done, and they need to get the initialized
+ * parameter servers. Other trainers need to get the initialized
 * parameters from parameter servers using @paddle_get_params.
 *
- * @param pserver_config_proto serialized parameter server configuration in
- * Protocol Buffers format.
 * @return 1 if the trainer is selected to initialize parameter
 * servers, otherwise 0.
 */
-int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_config_proto);
+int paddle_begin_init_params(paddle_pserver_client* client);

 /**
 * @brief paddle_init_param initializes the parameter on parameter
@@ -106,12 +103,13 @@ int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_
 *
 * @param param the parameter to initialize.
 * @param param_config_proto the configuration for the parameter.
+ * @param config_len the length of param_config_proto
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
 */
-int paddle_init_param(paddle_pserver_client* client, paddle_parameter params, const char* param_config_proto);
+int paddle_init_param(paddle_pserver_client* client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);

 /**
 * @brief paddle_finish_init_params tells parameter servers client has
@@ -138,6 +136,9 @@ int paddle_send_grads(paddle_pserver_client* client, const paddle_gradient* grad
 /**
 * @brief paddle_get_params gets parameters from parameter servers.
 *
+ * paddle_get_params will block until parameters are initialized on
+ * the parameter servers.
+ *
 * @param names the array of names of the parameters to get.
 * @param dst the destination array of parameters to save to.
 * @param len the length of the names array and the paddle_parameter

--- a/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt
+++ b/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt
+# Design Doc: The C++ Class `Parameters`
+
+`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
+
+We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
+* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
+* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
+
+It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
+
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
+It is evident that we should use `paddle::Parameter` when developing `Parameters`.
+However, the `Parameter` class contains many functions and does not have a clear interface.
+It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
+When we developing `Parameters`, we only use `create/store Parameter` functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
+
+2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
+We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
+Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
+`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. 
+So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+
+
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
+
+1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
+
+2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
+
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
+Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
+`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
+   * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
+   * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
+
+4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
+
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
--- a/develop/doc_cn/api/v2/config/evaluators.html
+++ b/develop/doc_cn/api/v2/config/evaluators.html
@@ -196,15 +196,15 @@
 <h2>Classification<a class="headerlink" href="#classification" title="永久链接至标题">¶</a></h2>
 <div class="section" id="classification-error">
 <h3>classification_error<a class="headerlink" href="#classification-error" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Classification Error Evaluator. It will print error rate for classification.</p>
 <p>The classification error is:</p>
 <div class="math">
 \[classification\_error = \frac{NumOfWrongPredicts}{NumOfAllSamples}\]</div>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span>  <span class="n">classification_error_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">prob</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span>  <span class="n">classification_evaluator</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">prob</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -235,12 +235,12 @@ important this sample is.</li>
 </div>
 <div class="section" id="auc">
 <h3>auc<a class="headerlink" href="#auc" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">auc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">auc</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Auc Evaluator which adapts to binary classification.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">auc_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">auc</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -263,12 +263,12 @@ important this sample is.</li>
 </div>
 <div class="section" id="ctc-error">
 <h3>ctc_error<a class="headerlink" href="#ctc-error" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">ctc_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">ctc_error</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This evaluator is to calculate sequence-to-sequence edit distance.</p>
 <p>The simple usage is :</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">ctc_error_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">ctc_evaluator</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">lbl</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -290,32 +290,68 @@ label for ctc</li>
 </div>
 <div class="section" id="chunk">
 <h3>chunk<a class="headerlink" href="#chunk" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a
-sequence. It calculates the chunk detection F1 score.</p>
-<p>A chunk is correctly detected if its beginning, end and type are correct.
-Other chunk type is ignored.</p>
-<p>For each label in the label sequence, we have:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tagType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">%</span> <span class="n">numTagType</span>
-<span class="n">chunkType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">/</span> <span class="n">numTagType</span>
-<span class="n">otherChunkType</span> <span class="o">=</span> <span class="n">numChunkTypes</span>
+sequence. It calculates precision, recall and F1 scores for the chunk detection.</p>
+<p>To use chunk evaluator, several concepts need to be clarified firstly.</p>
+<ul class="simple">
+<li><strong>Chunk type</strong> is the type of the whole chunk and a chunk consists of one or several words.  (For example in NER, ORG for organization name, PER for person name etc.)</li>
+<li><strong>Tag type</strong> indicates the position of a word in a chunk. (B for begin, I for inside, E for end, S for single)</li>
+</ul>
+<p>We can name a label by combining tag type and chunk type. (ie. B-ORG for begining of an organization name)</p>
+<p>The construction of label dictionary should obey the following rules:</p>
+<ul class="simple">
+<li>Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.</li>
+</ul>
+<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme    Description
+plain    Use the same label for the whole chunk.
+IOB      Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside.
+IOE      Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside.
+IOBES    Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk.
+</pre></div>
+</div>
+<p>To make it clear, let&#8217;s illustrate by an NER example.
+Assuming that there are three named entity types including ORG, PER and LOC which are called &#8216;chunk type&#8217; here,
+if &#8216;IOB&#8217; scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER, B-LOC, I-LOC and O,
+in which B-ORG for begining of ORG and I-ORG for inside of ORG.
+Prefixes which are called &#8216;tag type&#8217; here are added to chunk types and there are two tag types including B and I.
+Of course, the training data should be labeled accordingly.</p>
+<ul class="simple">
+<li>Mapping is done correctly by the listed equations and assigning protocol.</li>
+</ul>
+<p>The following table are equations to extract tag type and chunk type from a label.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>tagType = label % numTagType
+chunkType = label / numTagType
+otherChunkType = numChunkTypes
+</pre></div>
+</div>
+<p>The following table shows the mapping rule between tagType and tag type in each scheme.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Begin Inside End   Single
+plain  0     -      -     -
+IOB    0     1      -     -
+IOE    -     0      1     -
+IOBES  0     1      2     3
 </pre></div>
 </div>
-<p>The total number of different labels is numTagType*numChunkTypes+1.
-We support 4 labelling scheme.
-The tag type for each of the scheme is shown as follows:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Scheme</span> <span class="n">Begin</span> <span class="n">Inside</span> <span class="n">End</span>   <span class="n">Single</span>
-<span class="n">plain</span>  <span class="mi">0</span>     <span class="o">-</span>      <span class="o">-</span>     <span class="o">-</span>
-<span class="n">IOB</span>    <span class="mi">0</span>     <span class="mi">1</span>      <span class="o">-</span>     <span class="o">-</span>
-<span class="n">IOE</span>    <span class="o">-</span>     <span class="mi">0</span>      <span class="mi">1</span>     <span class="o">-</span>
-<span class="n">IOBES</span>  <span class="mi">0</span>     <span class="mi">1</span>      <span class="mi">2</span>     <span class="mi">3</span>
+<p>Continue the NER example, and the label dict should look like this to satify above equations:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>B-ORG  0
+I-ORG  1
+B-PER  2
+I-PER  3
+B-LOC  4
+I-LOC  5
+O      6
 </pre></div>
 </div>
-<p>&#8216;plain&#8217; means the whole chunk must contain exactly the same chunk label.</p>
+<p>In this example, chunkType has three values: 0 for ORG, 1 for PER, 2 for LOC, because the scheme is
+&#8220;IOB&#8221; so tagType has two values: 0 for B and 1 for I.
+Here we will use I-LOC to explain the above mapping rules in detail.
+For I-LOC, the label id is 5, so we can get tagType=1 and chunkType=2, which means I-LOC is a part of NER chunk LOC
+and the tag is I.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">chunk</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -340,9 +376,9 @@ The tag type for each of the scheme is shown as follows:</p>
 </div>
 <div class="section" id="precision-recall">
 <h3>precision_recall<a class="headerlink" href="#precision-recall" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">precision_recall</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">precision_recall</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>An Evaluator to calculate precision and recall, F1-score.
 It is adapt to the task with multiple labels.</p>
 <ul class="simple">
@@ -352,7 +388,7 @@ F1-score of all labels.</li>
 F1-score of this label.</li>
 </ul>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">precision_recall_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">precision_evaluator</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -379,13 +415,13 @@ F1-score of this label.</li>
 <h2>Rank<a class="headerlink" href="#rank" title="永久链接至标题">¶</a></h2>
 <div class="section" id="pnpair">
 <h3>pnpair<a class="headerlink" href="#pnpair" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">pnpair</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">pnpair</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Positive-negative pair rate Evaluator which adapts to rank task like
 learning to rank. This evaluator must contain at least three layers.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">pnpair_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">info</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">pnpair</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">info</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -412,12 +448,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 <h2>Utils<a class="headerlink" href="#utils" title="永久链接至标题">¶</a></h2>
 <div class="section" id="sum">
 <h3>sum<a class="headerlink" href="#sum" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>An Evaluator to sum the result of input.</p>
 <p>The simple usage:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">sum_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -439,12 +475,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 </div>
 <div class="section" id="column-sum">
 <h3>column_sum<a class="headerlink" href="#column-sum" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">column_sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">column_sum</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to sum the last column of input.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">column_sum_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">column_evaluator</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -467,12 +503,12 @@ learning to rank. This evaluator must contain at least three layers.</p>
 <h2>Print<a class="headerlink" href="#print" title="永久链接至标题">¶</a></h2>
 <div class="section" id="classification-error-printer">
 <h3>classification_error_printer<a class="headerlink" href="#classification-error-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">classification_error_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the classification error of each sample.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">classification_error_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">classification_error_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -493,13 +529,13 @@ learning to rank. This evaluator must contain at least three layers.</p>
 </div>
 <div class="section" id="gradient-printer">
 <h3>gradient_printer<a class="headerlink" href="#gradient-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">gradient_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">gradient_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the gradient of input layers. It contains
 one or more input layers.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">gradient_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">gradient_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -519,14 +555,14 @@ one or more input layers.</p>
 </div>
 <div class="section" id="maxid-printer">
 <h3>maxid_printer<a class="headerlink" href="#maxid-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxid_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxid_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print maximum top k values and their indexes
 of each row of input layers. It contains one or more input layers.
 k is specified by num_results.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxid_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxid_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -548,9 +584,9 @@ It is 1 by default.</li>
 </div>
 <div class="section" id="maxframe-printer">
 <h3>maxframe_printer<a class="headerlink" href="#maxframe-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxframe_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">maxframe_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the top k frames of each input layers.
 The input layers should contain sequences info or sequences type.
 k is specified by num_results.
@@ -560,7 +596,7 @@ It contains one or more input layers.</p>
 <p class="last">The width of each frame is 1.</p>
 </div>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxframe_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">maxframe_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -580,9 +616,9 @@ It contains one or more input layers.</p>
 </div>
 <div class="section" id="seqtext-printer">
 <h3>seqtext_printer<a class="headerlink" href="#seqtext-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">seqtext_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">seqtext_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Sequence text printer will print text according to index matrix and a
 dictionary. There can be multiple input to this layer:</p>
 <p>1. If there is no id_input, the input must be a matrix containing
@@ -614,7 +650,7 @@ the sequence of indices;</p>
 <p>Typically SequenceTextPrinter layer takes output of maxid or RecurrentGroup
 with maxid (when generating) as an input.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">seqtext_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">maxid</span><span class="p">,</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">seqtext_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">maxid</span><span class="p">,</span>
                                 <span class="n">id_input</span><span class="o">=</span><span class="n">sample_id</span><span class="p">,</span>
                                 <span class="n">dict_file</span><span class="o">=</span><span class="n">dict_file</span><span class="p">,</span>
                                 <span class="n">result_file</span><span class="o">=</span><span class="n">result_file</span><span class="p">)</span>
@@ -654,13 +690,13 @@ Default is True. No space is added if set to False.</li>
 </div>
 <div class="section" id="value-printer">
 <h3>value_printer<a class="headerlink" href="#value-printer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">value_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.evaluator.</code><code class="descname">value_printer</code><span class="sig-paren">(</span><em>*args</em>, <em>**xargs</em><span class="sig-paren">)</span></dt>
 <dd><p>This Evaluator is used to print the values of input layers. It contains
 one or more input layers.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">value_printer_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">value_evaluator</span><span class="o">.</span><span class="n">printer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">

--- a/develop/doc_cn/api/v2/config/layer.html
+++ b/develop/doc_cn/api/v2/config/layer.html
@@ -196,35 +196,10 @@
 <h2>Data layer<a class="headerlink" href="#data-layer" title="永久链接至标题">¶</a></h2>
 <div class="section" id="data">
 <span id="api-v2-layer-data"></span><h3>data<a class="headerlink" href="#data" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="attribute">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">data</code><span class="sig-paren">(</span><em>name</em>, <em>type</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define DataLayer For NeuralNetwork.</p>
-<p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;input&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">data_type</span><span class="o">.</span><span class="n">dense_vector</span><span class="p">(</span><span class="mi">1000</span><span class="p">))</span>
-</pre></div>
-</div>
-<table class="docutils field-list" frame="void" rules="none">
-<col class="field-name" />
-<col class="field-body" />
-<tbody valign="top">
-<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this data layer.</li>
-<li><strong>type</strong> &#8211; Data type of this data layer</li>
-<li><strong>height</strong> (<em>int|None</em>) &#8211; Height of this data layer, used for image</li>
-<li><strong>width</strong> (<em>int|None</em>) &#8211; Width of this data layer, used for image</li>
-<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-</ul>
-</td>
-</tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
-</td>
-</tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
-</td>
-</tr>
-</tbody>
-</table>
+<code class="descclassname">paddle.v2.layer.</code><code class="descname">data</code></dt>
+<dd><p><code class="xref py py-class docutils literal"><span class="pre">name</span></code> 的别名</p>
 </dd></dl>

 </div>
@@ -235,12 +210,12 @@
 <span id="api-v2-layer-fc"></span><h3>fc<a class="headerlink" href="#fc" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">fc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">fc</code></dt>
 <dd><p>Helper for declare fully connected layer.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">fc</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
              <span class="n">size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span>
-              <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
+              <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
              <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -257,7 +232,7 @@
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer. Could be a list/tuple of input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -281,13 +256,13 @@ default Bias.</li>
 <h3>selective_fc<a class="headerlink" href="#selective-fc" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">selective_fc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">selective_fc</code></dt>
 <dd><p>Selectived fully connected layer. Different from fc, the output
 of this layer maybe sparse. It requires an additional input to indicate
 several selected columns for output. If the selected columns is not
 specified, selective_fc acts exactly like fc.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -301,7 +276,7 @@ specified, selective_fc acts exactly like fc.</p>
 sparse binary matrix, and treat as the mask of selective fc.
 If is None, acts exactly like fc.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -328,7 +303,7 @@ default Bias.</li>
 <h3>conv_operator<a class="headerlink" href="#conv-operator" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_operator</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_operator</code></dt>
 <dd><p>Different from img_conv, conv_op is an Operator, which can be used
 in mixed. And conv_op takes two inputs to perform convolution.
 The first input is the image and the second is filter kernel. It only
@@ -376,7 +351,7 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <h3>conv_projection<a class="headerlink" href="#conv-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_projection</code></dt>
 <dd><p>Different from img_conv and conv_op, conv_projection is an Projection,
 which can be used in mixed and conat. It use cudnn to implement
 conv and only support GPU mode.</p>
@@ -424,7 +399,7 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <h3>conv_shift<a class="headerlink" href="#conv-shift" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_shift</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">conv_shift</code></dt>
 <dd><dl class="docutils">
 <dt>This layer performs cyclic convolution for two input. For example:</dt>
 <dd><ul class="first last simple">
@@ -477,7 +452,7 @@ the right size (which is the end of array) to the left.</li>
 <h3>img_conv<a class="headerlink" href="#img-conv" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_conv</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_conv</code></dt>
 <dd><p>Convolution layer for image. Paddle can support both square and non-square
 input currently.</p>
 <p>The details of convolution layer, please refer UFLDL&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/">convolution</a> .</p>
@@ -501,7 +476,7 @@ rest channels will be processed by rest group of filters.</p>
                      <span class="n">num_channels</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span>
                      <span class="n">num_filters</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
                      <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
-                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
+                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -517,7 +492,7 @@ two image dimension.</li>
 currently supports rectangular filters, the filter&#8217;s
 shape will be (filter_size, filter_size_y).</li>
 <li><strong>num_filters</strong> &#8211; Each filter group&#8217;s number of filter</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type. Default is tanh</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type. Default is tanh</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; Group size of filters.</li>
 <li><strong>stride</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the stride. Or input a tuple for two image
 dimension.</li>
@@ -555,7 +530,7 @@ otherwise layer_type has to be either &#8220;exconv&#8221; or
 <span id="api-v2-layer-context-projection"></span><h3>context_projection<a class="headerlink" href="#context-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">context_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">context_projection</code></dt>
 <dd><p>Context Projection.</p>
 <p>It just simply reorganizes input sequence, combines &#8220;context_len&#8221; sequence
 to one context from context_start. &#8220;context_start&#8221; will be set to
@@ -598,7 +573,7 @@ parameter attribute is set by this parameter.</li>
 <h3>img_pool<a class="headerlink" href="#img-pool" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_pool</code></dt>
 <dd><p>Image pooling Layer.</p>
 <p>The details of pooling layer, please refer ufldl&#8217;s <a class="reference external" href="http://ufldl.stanford.edu/tutorial/supervised/Pooling/">pooling</a> .</p>
 <ul class="simple">
@@ -662,7 +637,7 @@ Defalut is True. If set false, Otherwise use floor.</li>
 <h3>spp<a class="headerlink" href="#spp" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">spp</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">spp</code></dt>
 <dd><p>Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
 The details please refer to
 <a class="reference external" href="https://arxiv.org/abs/1406.4729">Kaiming He&#8217;s paper</a>.</p>
@@ -702,7 +677,7 @@ The details please refer to
 <h3>maxout<a class="headerlink" href="#maxout" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">maxout</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">maxout</code></dt>
 <dd><dl class="docutils">
 <dt>A layer to do max out on conv layer output.</dt>
 <dd><ul class="first last simple">
@@ -759,7 +734,7 @@ automatically from previous output.</li>
 <h3>img_cmrnorm<a class="headerlink" href="#img-cmrnorm" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_cmrnorm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">img_cmrnorm</code></dt>
 <dd><p>Response normalization across feature maps.
 The details please refer to
 <a class="reference external" href="http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf">Alex&#8217;s paper</a>.</p>
@@ -798,7 +773,7 @@ num_channels is None, it will be set automatically.</li>
 <h3>batch_norm<a class="headerlink" href="#batch-norm" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">batch_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">batch_norm</code></dt>
 <dd><p>Batch Normalization Layer. The notation of this layer as follow.</p>
 <p><span class="math">\(x\)</span> is the input features over a mini-batch.</p>
 <div class="math">
@@ -812,7 +787,7 @@ y_i &amp;\gets \gamma \hat{x_i} + \beta \qquad &amp;//\ scale\ and\ shift\end{sp
 <p>The details of batch normalization please refer to this
 <a class="reference external" href="http://arxiv.org/abs/1502.03167">paper</a>.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">norm</span> <span class="o">=</span> <span class="n">batch_norm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">net</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">norm</span> <span class="o">=</span> <span class="n">batch_norm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">net</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -832,7 +807,7 @@ automaticly select cudnn_batch_norm for GPU and
 batch_norm for CPU. Otherwise, select batch norm
 type based on the specified type. If you use cudnn_batch_norm,
 we suggested you use latest version, such as v5.1.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Better be relu. Because batch
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Better be relu. Because batch
 normalization will normalize input near zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; num of image channels or previous layer&#8217;s number of
 filters. None will automatically get from layer&#8217;s
@@ -870,7 +845,7 @@ computation, referred to as facotr,
 <h3>sum_to_one_norm<a class="headerlink" href="#sum-to-one-norm" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_to_one_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_to_one_norm</code></dt>
 <dd><p>A layer for sum-to-one normalization,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -907,7 +882,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <h3>cross_channel_norm<a class="headerlink" href="#cross-channel-norm" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_channel_norm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_channel_norm</code></dt>
 <dd><p>Normalize a layer&#8217;s output. This layer is necessary for ssd.
 This layer applys normalize across the channels of each sample to
 a conv layer&#8217;s output and scale the output by a group of trainable
@@ -938,7 +913,7 @@ factors which dimensions equal to the channel&#8217;s number.</p>
 <h3>recurrent<a class="headerlink" href="#recurrent" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">recurrent</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">recurrent</code></dt>
 <dd><p>Simple recurrent unit layer. It is just a fully connect layer through both
 time and neural network.</p>
 <p>For each sequence [start, end] it performs the following computation:</p>
@@ -955,7 +930,7 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input Layer</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; bias attribute.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the layer</li>
@@ -978,7 +953,7 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <h3>lstmemory<a class="headerlink" href="#lstmemory" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstmemory</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstmemory</code></dt>
 <dd><p>Long Short-term Memory Cell.</p>
 <p>The memory cell was implemented as follow equations.</p>
 <div class="math">
@@ -1002,9 +977,9 @@ more details about LSTM.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The lstmemory layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; is sequence process reversed or not.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation type, paddle.v2.Activation.Tanh by default. <span class="math">\(h_t\)</span></li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; gate activation type, paddle.v2.Activation.Sigmoid by default.</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; state activation type, paddle.v2.Activation.Tanh by default.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.</li>
+<li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; state activation type, paddle.v2.activation.Tanh by default.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
 bias.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
@@ -1027,7 +1002,7 @@ bias.</li>
 <h3>grumemory<a class="headerlink" href="#grumemory" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">grumemory</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">grumemory</code></dt>
 <dd><p>Gate Recurrent Unit Layer.</p>
 <p>The memory cell was implemented as follow equations.</p>
 <p>1. update gate <span class="math">\(z\)</span>: defines how much of the previous memory to
@@ -1067,9 +1042,9 @@ Recurrent Neural Networks on Sequence Modeling.</a></p>
 <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; input layer.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; activation type, paddle.v2.Activation.Tanh by default. This activation
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. This activation
 affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; gate activation type, paddle.v2.Activation.Sigmoid by default.
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.
 This activation affects the <span class="math">\(z_t\)</span> and <span class="math">\(r_t\)</span>. It is the
 <span class="math">\(\sigma\)</span> in the above formula.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
@@ -1099,7 +1074,7 @@ will get a warning.</li>
 <h3>memory<a class="headerlink" href="#memory" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">memory</code><span class="sig-paren">(</span><em>name</em>, <em>extra_input=None</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">memory</code></dt>
 <dd><p>The memory layers is a layer cross each time step. Reference this output
 as previous time step layer <code class="code docutils literal"><span class="pre">name</span></code> &#8216;s output.</p>
 <p>The default memory is zero in first time step, previous time step&#8217;s
@@ -1108,12 +1083,12 @@ output in the rest time steps.</p>
 with activation.</p>
 <p>If boot_with_const_id, then the first time stop is a IndexSlot, the
 Arguments.ids()[0] is this <code class="code docutils literal"><span class="pre">cost_id</span></code>.</p>
-<p>If boot_layer is not null, the memory is just the boot_layer&#8217;s output.
+<p>If boot is not null, the memory is just the boot&#8217;s output.
 Set <code class="code docutils literal"><span class="pre">is_seq</span></code> is true boot layer is sequence.</p>
 <p>The same name layer in recurrent group will set memory on each time
 step.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">mem</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
-<span class="n">state</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
+<span class="n">state</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;state&#39;</span><span class="p">)</span>
 </pre></div>
 </div>
 <p>If you do not want to specify the name, you can equivalently use set_input()
@@ -1129,18 +1104,18 @@ name of the layer which this memory remembers.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; size of memory.</li>
 <li><strong>memory_name</strong> (<em>basestring</em>) &#8211; the name of the memory.
 It is ignored when name is provided.</li>
-<li><strong>is_seq</strong> (<em>bool</em>) &#8211; is sequence for boot_layer</li>
-<li><strong>boot_layer</strong> (<em>LayerOutput|None</em>) &#8211; boot layer of memory.</li>
-<li><strong>boot_bias</strong> (<em>ParameterAttribute|None</em>) &#8211; boot layer&#8217;s bias</li>
-<li><strong>boot_bias_active_type</strong> (<em>BaseActivation</em>) &#8211; boot layer&#8217;s active type.</li>
+<li><strong>is_seq</strong> (<em>bool</em>) &#8211; is sequence for boot</li>
+<li><strong>boot</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; boot layer of memory.</li>
+<li><strong>boot_bias</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; boot layer&#8217;s bias</li>
+<li><strong>boot_bias_active_type</strong> (<em>paddle.v2.activation.Base</em>) &#8211; boot layer&#8217;s active type.</li>
 <li><strong>boot_with_const_id</strong> (<em>int</em>) &#8211; boot layer&#8217;s id.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object which is a memory.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object which is a memory.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -1160,9 +1135,9 @@ sequence input. This is extremely usefull for attention based model, or
 Neural Turning Machine like models.</p>
 <p>The basic usage (time steps) is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
-    <span class="n">output</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
+    <span class="n">output</span> <span class="o">=</span> <span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
                      <span class="n">size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span>
-                      <span class="n">act</span><span class="o">=</span><span class="n">LinearActivation</span><span class="p">(),</span>
+                      <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
                      <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">output</span>

@@ -1172,8 +1147,8 @@ Neural Turning Machine like models.</p>
 </div>
 <p>You can see following configs for further usages:</p>
 <ul class="simple">
-<li>time steps: lstmemory_group, paddle/gserver/tests/sequence_layer_group.conf,                   demo/seqToseq/seqToseq_net.py</li>
-<li>sequence steps: paddle/gserver/tests/sequence_nest_layer_group.conf</li>
+<li>time steps: lstmemory_group, paddle/gserver/tests/sequence_group.conf,                   demo/seqToseq/seqToseq_net.py</li>
+<li>sequence steps: paddle/gserver/tests/sequence_nest_group.conf</li>
 </ul>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -1189,24 +1164,24 @@ a time step result. Then gather each time step of output into
 layer group&#8217;s output.</p>
 </li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; recurrent_group&#8217;s name.</li>
-<li><strong>input</strong> (<em>LayerOutput|StaticInput|SubsequenceInput|list|tuple</em>) &#8211; <p>Input links array.</p>
-<p>LayerOutput will be scattered into time steps.
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|StaticInput|SubsequenceInput|list|tuple</em>) &#8211; <p>Input links array.</p>
+<p>paddle.v2.config_base.Layer will be scattered into time steps.
 SubsequenceInput will be scattered into sequence steps.
 StaticInput will be imported to each time step, and doesn&#8217;t change
 through time. It&#8217;s a mechanism to access layer outside step function.</p>
 </li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; If reverse is set true, the recurrent unit will process the
 input sequence in a reverse order.</li>
-<li><strong>targetInlink</strong> (<em>LayerOutput|SubsequenceInput</em>) &#8211; <p>the input layer which share info with layer group&#8217;s output</p>
+<li><strong>targetInlink</strong> (<em>paddle.v2.config_base.Layer|SubsequenceInput</em>) &#8211; <p>the input layer which share info with layer group&#8217;s output</p>
 <p>Param input specifies multiple input layers. For
 SubsequenceInput inputs, config should assign one input
 layer that share info(the number of sentences and the number
 of words in each sentence) with all layer group&#8217;s outputs.
 targetInlink should be one of the layer group&#8217;s input.</p>
 </li>
-<li><strong>is_generating</strong> &#8211; If is generating, none of input type should be LayerOutput;
+<li><strong>is_generating</strong> &#8211; If is generating, none of input type should be paddle.v2.config_base.Layer;
 else, for training or testing, one of the input type must
-be LayerOutput.</li>
+be paddle.v2.config_base.Layer.</li>
 </ul>
 </td>
 </tr>
@@ -1217,9 +1192,9 @@ be LayerOutput.</li>
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
-<tr class="field-odd field"><th class="field-name">返回:</th><td class="field-body">LayerOutput object.</td>
+<tr class="field-odd field"><th class="field-name">返回:</th><td class="field-body">paddle.v2.config_base.Layer object.</td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回类型:</th><td class="field-body">LayerOutput</td>
+<tr class="field-even field"><th class="field-name">返回类型:</th><td class="field-body">paddle.v2.config_base.Layer</td>
 </tr>
 </tbody>
 </table>
@@ -1230,7 +1205,7 @@ be LayerOutput.</li>
 <h3>lstm_step<a class="headerlink" href="#lstm-step" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lstm_step</code></dt>
 <dd><p>LSTM Step Layer. It used in recurrent_group. The lstm equations are shown
 as follow.</p>
 <div class="math">
@@ -1255,10 +1230,10 @@ output is <span class="math">\(o_t\)</span>, which name is &#8216;state&#8217; a
 <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer. <span class="math">\(Wx_t + Wh_{t-1}\)</span></li>
 <li><strong>state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; State Layer. <span class="math">\(c_{t-1}\)</span></li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type. Default is tanh</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Gate Activation Type. Default is sigmoid, and should
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type. Default is tanh</li>
+<li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Gate Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
+<li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Bias Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
@@ -1280,7 +1255,7 @@ be sigmoid only.</li>
 <h3>gru_step<a class="headerlink" href="#gru-step" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">gru_step</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">gru_step</code></dt>
 <dd><table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
@@ -1321,7 +1296,7 @@ to maintain tractability.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rnn_step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
    <span class="n">last_time_step_output</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span>
-    <span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
+    <span class="k">with</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">last_time_step_output</span>
    <span class="k">return</span> <span class="n">simple_rnn</span>
@@ -1383,7 +1358,7 @@ beam size.</li>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The generated word index.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -1395,7 +1370,7 @@ beam size.</li>
 <h3>get_output<a class="headerlink" href="#get-output" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">get_output</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">get_output</code></dt>
 <dd><p>Get layer&#8217;s output by name. In PaddlePaddle, a layer might return multiple
 values, but returns one layer&#8217;s output. If the user wants to use another
 output besides the default one, please use get_output first to get
@@ -1436,17 +1411,17 @@ multiple outputs.</li>
 Each inputs is a projection or operator.</p>
 <p>There are two styles of usages.</p>
 <ol class="arabic simple">
-<li>When not set inputs parameter, use mixed_layer like this:</li>
+<li>When not set inputs parameter, use mixed like this:</li>
 </ol>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span> <span class="k">as</span> <span class="n">m</span><span class="p">:</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span> <span class="k">as</span> <span class="n">m</span><span class="p">:</span>
    <span class="n">m</span> <span class="o">+=</span> <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer1</span><span class="p">)</span>
    <span class="n">m</span> <span class="o">+=</span> <span class="n">identity_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer2</span><span class="p">)</span>
 </pre></div>
 </div>
 <ol class="arabic simple" start="2">
-<li>You can also set all inputs when invoke mixed_layer as follows:</li>
+<li>You can also set all inputs when invoke mixed as follows:</li>
 </ol>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">m</span> <span class="o">=</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">m</span> <span class="o">=</span> <span class="n">mixed</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer1</span><span class="p">),</span>
                       <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer2</span><span class="p">)])</span>
 </pre></div>
@@ -1460,11 +1435,11 @@ Each inputs is a projection or operator.</p>
 <li><strong>size</strong> (<em>int</em>) &#8211; layer size.</li>
 <li><strong>input</strong> &#8211; inputs layer. It is an optional parameter. If set,
 then this function will just return layer&#8217;s name.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation Type.</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
-something not type of ParameterAttribute. None will get a
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+something not type of paddle.v2.attr.ParameterAttribute. None will get a
 default Bias.</li>
-<li><strong>layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; The extra layer config. Default is None.</li>
+<li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; The extra layer config. Default is None.</li>
 </ul>
 </td>
 </tr>
@@ -1483,7 +1458,7 @@ default Bias.</li>
 <span id="api-v2-layer-embedding"></span><h3>embedding<a class="headerlink" href="#embedding" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">embedding</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">embedding</code></dt>
 <dd><p>Define a embedding Layer.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -1514,7 +1489,7 @@ for details.</li>
 <h3>scaling_projection<a class="headerlink" href="#scaling-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling_projection</code></dt>
 <dd><p>scaling_projection multiplies the input with a scalar parameter and add to
 the output.</p>
 <div class="math">
@@ -1548,7 +1523,7 @@ the output.</p>
 <h3>dotmul_projection<a class="headerlink" href="#dotmul-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_projection</code></dt>
 <dd><p>DotMulProjection with a layer as input.
 It performs element-wise multiplication with weight.</p>
 <div class="math">
@@ -1583,7 +1558,7 @@ It performs element-wise multiplication with weight.</p>
 <h3>dotmul_operator<a class="headerlink" href="#dotmul-operator" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_operator</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dotmul_operator</code></dt>
 <dd><p>DotMulOperator takes two inputs and performs element-wise multiplication:</p>
 <div class="math">
 \[out.row[i] += scale * (a.row[i] .* b.row[i])\]</div>
@@ -1619,7 +1594,7 @@ scale is a config scalar, its default value is one.</p>
 <h3>full_matrix_projection<a class="headerlink" href="#full-matrix-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">full_matrix_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">full_matrix_projection</code></dt>
 <dd><p>Full Matrix Projection. It performs full matrix multiplication.</p>
 <div class="math">
 \[out.row[i] += in.row[i] * weight\]</div>
@@ -1665,7 +1640,7 @@ scale is a config scalar, its default value is one.</p>
 <h3>identity_projection<a class="headerlink" href="#identity-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">identity_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">identity_projection</code></dt>
 <dd><ol class="arabic simple">
 <li>IdentityProjection if offset=None. It performs:</li>
 </ol>
@@ -1711,7 +1686,7 @@ It select dimesions [offset, offset+layer_size) from input:</p>
 <h3>table_projection<a class="headerlink" href="#table-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">table_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">table_projection</code></dt>
 <dd><p>Table Projection. It selects rows from parameter where row_id
 is in input_ids.</p>
 <div class="math">
@@ -1760,7 +1735,7 @@ and <span class="math">\(i\)</span> is row_id.</p>
 <h3>trans_full_matrix_projection<a class="headerlink" href="#trans-full-matrix-projection" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans_full_matrix_projection</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans_full_matrix_projection</code></dt>
 <dd><p>Different from full_matrix_projection, this projection performs matrix
 multiplication, using transpose of weight.</p>
 <div class="math">
@@ -1828,7 +1803,7 @@ sequence of a nested sequence, <code class="code docutils literal"><span class="
 <span id="id1"></span><h3>pooling<a class="headerlink" href="#api-v2-layer-pooling" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pooling</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pooling</code></dt>
 <dd><p>Pooling layer for sequence inputs, not used for Image.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">seq_pool</span> <span class="o">=</span> <span class="n">pooling</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">layer</span><span class="p">,</span>
@@ -1867,7 +1842,7 @@ SumPooling, SquareRootNPooling.</li>
 <span id="api-v2-layer-last-seq"></span><h3>last_seq<a class="headerlink" href="#last-seq" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">last_seq</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">last_seq</code></dt>
 <dd><p>Get Last Timestamp Activation of a sequence.</p>
 <p>If stride &gt; 0, this layer slides a window whose size is determined by stride,
 and return the last value of the window as the output. Thus, a long sequence
@@ -1905,7 +1880,7 @@ of stride is -1.</p>
 <span id="api-v2-layer-first-seq"></span><h3>first_seq<a class="headerlink" href="#first-seq" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">first_seq</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">first_seq</code></dt>
 <dd><p>Get First Timestamp Activation of a sequence.</p>
 <p>If stride &gt; 0, this layer slides a window whose size is determined by stride,
 and return the first value of the window as the output. Thus, a long sequence
@@ -1943,7 +1918,7 @@ of stride is -1.</p>
 <h3>concat<a class="headerlink" href="#concat" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">concat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">concat</code></dt>
 <dd><p>Concat all input vector into one huge vector.
 Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <p>The example usage is:</p>
@@ -1957,7 +1932,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>list|tuple|collections.Sequence</em>) &#8211; input layers or projections</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -1977,7 +1952,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <h3>seq_concat<a class="headerlink" href="#seq-concat" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_concat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_concat</code></dt>
 <dd><p>Concat sequence a with sequence b.</p>
 <dl class="docutils">
 <dt>Inputs:</dt>
@@ -2001,7 +1976,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2027,7 +2002,7 @@ default Bias.</li>
 <h3>block_expand<a class="headerlink" href="#block-expand" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">block_expand</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">block_expand</code></dt>
 <dd><dl class="docutils">
 <dt>Expand feature map to minibatch matrix.</dt>
 <dd><ul class="first last simple">
@@ -2103,7 +2078,7 @@ sequence of a nested sequence, <code class="code docutils literal"><span class="
 <h3>expand<a class="headerlink" href="#expand" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">expand</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">expand</code></dt>
 <dd><p>A layer for &#8220;Expand Dense data or (sequence data where the length of each
 sequence is one) to sequence data.&#8221;</p>
 <p>The example usage is:</p>
@@ -2142,7 +2117,7 @@ bias.</li>
 <h3>repeat<a class="headerlink" href="#repeat" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">repeat</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">repeat</code></dt>
 <dd><p>A layer for repeating the input for num_repeats times. This is equivalent
 to apply concat() with num_repeats same input.</p>
 <div class="math">
@@ -2178,7 +2153,7 @@ to apply concat() with num_repeats same input.</p>
 <h3>rotate<a class="headerlink" href="#rotate" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rotate</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rotate</code></dt>
 <dd><p>A layer for rotating 90 degrees (clock-wise) for each feature channel,
 usually used when the input sample is some image or feature map.</p>
 <div class="math">
@@ -2217,7 +2192,7 @@ usually used when the input sample is some image or feature map.</p>
 <h3>seq_reshape<a class="headerlink" href="#seq-reshape" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_reshape</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">seq_reshape</code></dt>
 <dd><p>A layer for reshaping the sequence. Assume the input sequence has T instances,
 the dimension of each instance is M, and the input reshape_size is N, then the
 output sequence has T*M/N instances, the dimension of each instance is N.</p>
@@ -2234,7 +2209,7 @@ output sequence has T*M/N instances, the dimension of each instance is N.</p>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>reshape_size</strong> (<em>int</em>) &#8211; the size of reshaped sequence.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation type.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2260,7 +2235,7 @@ default Bias.</li>
 <h3>addto<a class="headerlink" href="#addto" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">addto</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">addto</code></dt>
 <dd><p>AddtoLayer.</p>
 <div class="math">
 \[y = f(\sum_{i} x_i + b)\]</div>
@@ -2268,7 +2243,7 @@ default Bias.</li>
 and <span class="math">\(f\)</span> is activation function.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">addto</span> <span class="o">=</span> <span class="n">addto</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">,</span> <span class="n">layer2</span><span class="p">],</span>
-                    <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">(),</span>
+                    <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Relu</span><span class="p">(),</span>
                    <span class="n">bias_attr</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -2290,7 +2265,7 @@ Please refer to dropout for details.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; Input layers. It could be a paddle.v2.config_base.Layer or list/tuple of
 paddle.v2.config_base.Layer.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|bool</em>) &#8211; Bias attribute. If False, means no bias. None is default
 bias.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
@@ -2312,7 +2287,7 @@ bias.</li>
 <h3>linear_comb<a class="headerlink" href="#linear-comb" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">linear_comb</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">linear_comb</code></dt>
 <dd><dl class="docutils">
 <dt>A layer for weighted sum of vectors takes two inputs.</dt>
 <dd><ul class="first last simple">
@@ -2375,7 +2350,7 @@ processed in one batch.</p>
 <h3>interpolation<a class="headerlink" href="#interpolation" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">interpolation</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">interpolation</code></dt>
 <dd><p>This layer is for linear interpolation with two inputs,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -2414,7 +2389,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <h3>bilinear_interp<a class="headerlink" href="#bilinear-interp" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">bilinear_interp</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">bilinear_interp</code></dt>
 <dd><p>This layer is to implement bilinear interpolation on conv layer output.</p>
 <p>Please refer to Wikipedia: <a class="reference external" href="https://en.wikipedia.org/wiki/Bilinear_interpolation">https://en.wikipedia.org/wiki/Bilinear_interpolation</a></p>
 <p>The simple usage is:</p>
@@ -2449,7 +2424,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <h3>power<a class="headerlink" href="#power" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">power</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">power</code></dt>
 <dd><p>This layer applies a power function to a vector element-wise,
 which is used in NEURAL TURING MACHINE.</p>
 <div class="math">
@@ -2487,7 +2462,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 <h3>scaling<a class="headerlink" href="#scaling" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">scaling</code></dt>
 <dd><p>A layer for multiplying input vector by weight scalar.</p>
 <div class="math">
 \[y  = w x\]</div>
@@ -2526,7 +2501,7 @@ processed in one batch.</p>
 <h3>slope_intercept<a class="headerlink" href="#slope-intercept" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">slope_intercept</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">slope_intercept</code></dt>
 <dd><p>This layer for applying a slope and an intercept to the input
 element-wise. There is no activation and weight.</p>
 <div class="math">
@@ -2563,7 +2538,7 @@ element-wise. There is no activation and weight.</p>
 <h3>tensor<a class="headerlink" href="#tensor" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">tensor</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">tensor</code></dt>
 <dd><p>This layer performs tensor operation for two input.
 For example, each sample:</p>
 <div class="math">
@@ -2592,7 +2567,7 @@ For example, each sample:</p>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>size</strong> (<em>int.</em>) &#8211; the layer dimension.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
 <li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of paddle.v2.attr.ParameterAttribute. None will get a
@@ -2616,7 +2591,7 @@ default Bias.</li>
 <span id="api-v2-layer-cos-sim"></span><h3>cos_sim<a class="headerlink" href="#cos-sim" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cos_sim</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cos_sim</code></dt>
 <dd><p>Cosine Similarity Layer. The cosine similarity equation is here.</p>
 <div class="math">
 \[similarity = cos(\theta) = {\mathbf{a} \cdot \mathbf{b}
@@ -2659,7 +2634,7 @@ processed in one batch.</p>
 <h3>trans<a class="headerlink" href="#trans" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">trans</code></dt>
 <dd><p>A layer for transposing a minibatch matrix.</p>
 <div class="math">
 \[y = x^\mathrm{T}\]</div>
@@ -2697,7 +2672,7 @@ processed in one batch.</p>
 <h3>maxid<a class="headerlink" href="#maxid" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">max_id</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">max_id</code></dt>
 <dd><p>A layer for finding the id which has the maximal value for each sample.
 The result is stored in output.ids.</p>
 <p>The example usage is:</p>
@@ -2730,7 +2705,7 @@ The result is stored in output.ids.</p>
 <h3>sampling_id<a class="headerlink" href="#sampling-id" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sampling_id</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sampling_id</code></dt>
 <dd><p>A layer for sampling id from multinomial distribution from the input layer.
 Sampling one id for one sample.</p>
 <p>The simple usage is:</p>
@@ -2766,7 +2741,7 @@ Sampling one id for one sample.</p>
 <h3>pad<a class="headerlink" href="#pad" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pad</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">pad</code></dt>
 <dd><p>This operation pads zeros to the input data according to pad_c,pad_h
 and pad_w. pad_c, pad_h, pad_w specifies the which dimension and size
 of padding. And the input data shape is NCHW.</p>
@@ -2835,7 +2810,7 @@ in width dimension.</p>
 <h3>cross_entropy_cost<a class="headerlink" href="#cross-entropy-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_cost</code></dt>
 <dd><p>A loss layer for multi class entropy.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">cross_entropy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                     <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2873,7 +2848,7 @@ will not be calculated for weight.</li>
 <h3>cross_entropy_with_selfnorm_cost<a class="headerlink" href="#cross-entropy-with-selfnorm-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_with_selfnorm_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">cross_entropy_with_selfnorm_cost</code></dt>
 <dd><p>A loss layer for multi class entropy with selfnorm.
 Input should be a vector of positive numbers, without normalization.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">cross_entropy_with_selfnorm</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
@@ -2909,7 +2884,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>multi_binary_label_cross_entropy_cost<a class="headerlink" href="#multi-binary-label-cross-entropy-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">multi_binary_label_cross_entropy_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">multi_binary_label_cross_entropy_cost</code></dt>
 <dd><p>A loss layer for multi binary label cross entropy.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">multi_binary_label_cross_entropy</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                                        <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2943,7 +2918,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>huber_cost<a class="headerlink" href="#huber-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">huber_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">huber_cost</code></dt>
 <dd><p>A loss layer for huber loss.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">huber_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
                  <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">)</span>
@@ -2977,7 +2952,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <h3>lambda_cost<a class="headerlink" href="#lambda-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lambda_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">lambda_cost</code></dt>
 <dd><p>lambdaCost for lambdaRank LTR approach.</p>
 <p>The simple usage:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">lambda_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span>
@@ -3023,7 +2998,7 @@ entire list of get gradient.</li>
 <h3>mse_cost<a class="headerlink" href="#mse-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">mse_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">mse_cost</code></dt>
 <dd><blockquote>
 <div><p>mean squared error cost:</p>
 <div class="math">
@@ -3076,7 +3051,7 @@ It is an optional argument.</td>
 <h3>rank_cost<a class="headerlink" href="#rank-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rank_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">rank_cost</code></dt>
 <dd><p>A cost Layer for learning to rank using gradient descent. Details can refer
 to <a class="reference external" href="http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf">papers</a>.
 This layer contains at least three inputs. The weight is an optional
@@ -3131,7 +3106,7 @@ It is an optional argument.</li>
 <h3>sum_cost<a class="headerlink" href="#sum-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">sum_cost</code></dt>
 <dd><p>A loss layer which calculate the sum of the input as loss</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">cost</span> <span class="o">=</span> <span class="n">sum_cost</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">)</span>
 </pre></div>
@@ -3162,7 +3137,7 @@ It is an optional argument.</li>
 <h3>crf<a class="headerlink" href="#crf" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf</code></dt>
 <dd><p>A layer for calculating the cost of sequential conditional random
 field model.</p>
 <p>The simple usage:</p>
@@ -3203,7 +3178,7 @@ optional argument.</li>
 <h3>crf_decoding<a class="headerlink" href="#crf-decoding" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf_decoding</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">crf_decoding</code></dt>
 <dd><p>A layer for calculating the decoding sequence of sequential conditional
 random field model. The decoding sequence is stored in output.ids.
 If a second input is provided, it is treated as the ground-truth label, and
@@ -3243,7 +3218,7 @@ decoding or 0 for correct decoding.</p>
 <h3>ctc<a class="headerlink" href="#ctc" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">ctc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">ctc</code></dt>
 <dd><p>Connectionist Temporal Classification (CTC) is designed for temporal
 classication task. That is, for sequence labeling problems where the
 alignment between the inputs and the target labels is unknown.</p>
@@ -3294,7 +3269,7 @@ should also be num_classes + 1.</p>
 <h3>warp_ctc<a class="headerlink" href="#warp-ctc" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">warp_ctc</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">warp_ctc</code></dt>
 <dd><p>A layer intergrating the open-source <cite>warp-ctc
 &lt;https://github.com/baidu-research/warp-ctc&gt;</cite> library, which is used in
 <cite>Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin
@@ -3354,7 +3329,7 @@ should be consistent as that used in your labels.</li>
 <h3>nce<a class="headerlink" href="#nce" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">nce</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">nce</code></dt>
 <dd><p>Noise-contrastive estimation.
 Implements the method in the following paper:
 A fast and simple algorithm for training neural probabilistic language models.</p>
@@ -3374,7 +3349,7 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; label layer</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; weight layer, can be None(default)</li>
 <li><strong>num_classes</strong> (<em>int</em>) &#8211; number of classes.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; Activation, default is Sigmoid.</li>
+<li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation, default is Sigmoid.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 <li><strong>num_neg_samples</strong> (<em>int</em>) &#8211; number of negative samples. Default is 10.</li>
 <li><strong>neg_distribution</strong> (<em>list|tuple|collections.Sequence|None</em>) &#8211; The distribution for generating the random negative labels.
@@ -3400,7 +3375,7 @@ If not None, its length must be equal to num_classes.</li>
 <h3>hsigmoid<a class="headerlink" href="#hsigmoid" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">hsigmoid</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">hsigmoid</code></dt>
 <dd><p>Organize the classes into a binary tree. At each node, a sigmoid function
 is used to calculate the probability of belonging to the right branch.
 This idea is from &#8220;F. Morin, Y. Bengio (AISTATS 05):
@@ -3442,7 +3417,7 @@ False means no bias.</li>
 <h3>smooth_l1_cost<a class="headerlink" href="#smooth-l1-cost" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">smooth_l1_cost</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">smooth_l1_cost</code></dt>
 <dd><p>This is a L1 loss but more smooth. It requires that the
 size of input and label are equal. The formula is as follows,</p>
 <div class="math">
@@ -3486,7 +3461,7 @@ size of input and label are equal. The formula is as follows,</p>
 <h3>eos<a class="headerlink" href="#eos" title="永久链接至标题">¶</a></h3>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">eos</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">eos</code></dt>
 <dd><p>A layer for checking EOS for each sample:
 - output_id = (input_id == conf.eos_id)</p>
 <p>The result is stored in output_.ids.

--- a/develop/doc_cn/api/v2/config/networks.html
+++ b/develop/doc_cn/api/v2/config/networks.html
@@ -197,9 +197,9 @@
 <h2>NLP<a class="headerlink" href="#nlp" title="永久链接至标题">¶</a></h2>
 <div class="section" id="sequence-conv-pool">
 <h3>sequence_conv_pool<a class="headerlink" href="#sequence-conv-pool" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Text convolution pooling layers helper.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -208,34 +208,34 @@
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
 <li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling&#8217;s document.</li>
-<li><strong>context_proj_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
+<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
 None if user don&#8217;t care.</li>
-<li><strong>fc_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
 False if no bias.</li>
-<li><strong>fc_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; fc layer extra attribute.</li>
-<li><strong>context_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; context projection layer extra attribute.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; pooling layer extra attribute.</li>
+<li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
+<li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
+<li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -245,9 +245,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="text-conv-pool">
 <span id="api-trainer-config-helpers-network-text-conv-pool"></span><h3>text_conv_pool<a class="headerlink" href="#text-conv-pool" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Text convolution pooling layers helper.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -256,34 +256,34 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
 <li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling&#8217;s document.</li>
-<li><strong>context_proj_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
+<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
 None if user don&#8217;t care.</li>
-<li><strong>fc_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
 False if no bias.</li>
-<li><strong>fc_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; fc layer extra attribute.</li>
-<li><strong>context_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; context projection layer extra attribute.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; pooling layer extra attribute.</li>
+<li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
+<li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
+<li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -296,9 +296,9 @@ False if no bias.</li>
 <h2>Images<a class="headerlink" href="#images" title="永久链接至标题">¶</a></h2>
 <div class="section" id="img-conv-bn-pool">
 <h3>img_conv_bn_pool<a class="headerlink" href="#img-conv-bn-pool" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -306,33 +306,33 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>conv_attr</strong> (<em>Extrapaddle.v2.config_base.Layer</em>) &#8211; see img_conv&#8217;s document.</li>
-<li><strong>bn_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute.</em>) &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm&#8217;s document.</li>
-<li><strong>bn_attr</strong> &#8211; paddle.v2.attr.ParameterAttribute.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool&#8217;s document.</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_pool&#8217;s document.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer groups output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -342,9 +342,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="img-conv-group">
 <h3>img_conv_group<a class="headerlink" href="#img-conv-group" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_group</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Image Convolution Group, Used for vgg net.</p>
 <p>TODO(yuyang18): Complete docs</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -376,9 +376,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="simple-img-conv-pool">
 <span id="api-trainer-config-helpers-network-simple-img-conv-pool"></span><h3>simple_img_conv_pool<a class="headerlink" href="#simple-img-conv-pool" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
 <p>Input =&gt; conv =&gt; pooling</p>
 <table class="docutils field-list" frame="void" rules="none">
@@ -387,30 +387,30 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool for details</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; see img_conv for details</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv for details</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv for details</li>
-<li><strong>conv_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_conv for details</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool for details</li>
-<li><strong>pool_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; see img_pool for details</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -423,9 +423,9 @@ False if no bias.</li>
 </div>
 <div class="section" id="vgg-16-network">
 <h3>vgg_16_network<a class="headerlink" href="#vgg-16-network" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">vgg_16_network</code><span class="sig-paren">(</span><em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">vgg_16_network</code><span class="sig-paren">(</span><em>input_image</em>, <em>num_channels</em>, <em>num_classes=1000</em><span class="sig-paren">)</span></dt>
 <dd><p>Same model from <a class="reference external" href="https://gist.github.com/ksimonyan/211839e770f7b538e2d8">https://gist.github.com/ksimonyan/211839e770f7b538e2d8</a></p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
@@ -433,7 +433,7 @@ False if no bias.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>num_classes</strong> &#8211; </li>
-<li><strong>input_image</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; </li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
 </ul>
 </td>
@@ -453,9 +453,9 @@ False if no bias.</li>
 <h3>LSTM<a class="headerlink" href="#lstm" title="永久链接至标题">¶</a></h3>
 <div class="section" id="lstmemory-unit">
 <h4>lstmemory_unit<a class="headerlink" href="#lstmemory-unit" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Define calculations that a LSTM unit performs in a single time step.
 This function itself is not a recurrent layer, so that it can not be
 directly applied to sequence input. This function is always used in
@@ -469,9 +469,9 @@ for more details about LSTM. The link goes as follows:
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_unit</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                           <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                           <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                           <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">(),</span>
-                           <span class="n">state_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+                           <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                           <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                           <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -479,27 +479,27 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
-<li><strong>get_output_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>get_output_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">lstmemory unit name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -509,9 +509,9 @@ False means no bias, None means default bias.</li>
 </div>
 <div class="section" id="lstmemory-group">
 <h4>lstmemory_group<a class="headerlink" href="#lstmemory-group" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>lstm_group is a recurrent layer group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
@@ -524,14 +524,14 @@ lstmemory_group.</p>
 multiplications:
 <span class="math">\(W_{xi}x_{t}\)</span> , <span class="math">\(W_{xf}x_{t}\)</span>,
 <span class="math">\(W_{xc}x_t\)</span>, <span class="math">\(W_{xo}x_{t}\)</span> are not done in lstmemory_unit to
-speed up the calculations. Consequently, an additional mixed with
+speed up the calculations. Consequently, an additional mixed_layer with
 full_matrix_projection must be included before lstmemory_unit is called.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">lstm_step</span> <span class="o">=</span> <span class="n">lstmemory_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                            <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                            <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                            <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">(),</span>
-                            <span class="n">state_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">())</span>
+                            <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                            <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">(),</span>
+                            <span class="n">state_act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -539,28 +539,28 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory group name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
-<li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of mixed layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
-<li><strong>get_output_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>get_output_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; get output layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">the lstmemory group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -570,9 +570,9 @@ False means no bias, None means default bias.</li>
 </div>
 <div class="section" id="simple-lstm">
 <h4>simple_lstm<a class="headerlink" href="#simple-lstm" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
 <p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
 layer. The simple lstm cell was implemented as follow equations.</p>
@@ -586,25 +586,25 @@ want to know what lstm is. <a class="reference external" href="http://arxiv.org/
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>mat_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
-<li><strong>bias_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_cell_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">lstm layer name.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -614,9 +614,9 @@ means default bias.</li>
 </div>
 <div class="section" id="bidirectional-lstm">
 <h4>bidirectional_lstm<a class="headerlink" href="#bidirectional-lstm" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
 sequence both in forward and bardward orders, and then concatenate two
 outputs form a final output. However, concatenation of two outputs
@@ -636,7 +636,7 @@ The link goes as follows:
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
 <li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
 concatenated and returned.
@@ -646,10 +646,10 @@ concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -662,9 +662,9 @@ concatenated and returned.</li>
 <h3>GRU<a class="headerlink" href="#gru" title="永久链接至标题">¶</a></h3>
 <div class="section" id="gru-unit">
 <h4>gru_unit<a class="headerlink" href="#gru-unit" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Define calculations that a gated recurrent unit performs in a single time
 step. This function itself is not a recurrent layer, so that it can not be
 directly applied to sequence input. This function is almost always used in
@@ -676,19 +676,19 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activation</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">the gru output layer.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -698,9 +698,9 @@ mechanism.</p>
 </div>
 <div class="section" id="gru-group">
 <h4>gru_group<a class="headerlink" href="#gru-group" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_group</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>gru_group is a recurrent layer group version of Gated Recurrent Unit. It
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
@@ -711,8 +711,8 @@ to use the grumemory, which is relatively faster.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
-                <span class="n">act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span>
-                <span class="n">gate_act</span><span class="o">=</span><span class="n">paddle</span><span class="o">.</span><span class="n">v2</span><span class="o">.</span><span class="n">Activation</span><span class="o">.</span><span class="n">Sigmoid</span><span class="p">())</span>
+                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
+                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -720,21 +720,21 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -744,23 +744,23 @@ to use the grumemory, which is relatively faster.</p>
 </div>
 <div class="section" id="simple-gru">
 <h4>simple_gru<a class="headerlink" href="#simple-gru" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step, grumemory in layers.py, gru_unit, gru_group,
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
 series of layers to compute rnn step by step. This way is flexible for
 attenion mechanism or other complex connections.</p>
 <ul class="simple">
-<li>gru_step: only compute rnn by one step. It needs an memory as input
+<li>gru_step_layer: only compute rnn by one step. It needs an memory as input
 and can be used in recurrent group.</li>
-<li>gru_unit: a wrapper of gru_step with memory.</li>
+<li>gru_unit: a wrapper of gru_step_layer with memory.</li>
 <li>gru_group: a GRU cell implemented by a combination of multiple layers in
 recurrent group.
 But <span class="math">\(W x_t\)</span> is not done in group.</li>
@@ -781,21 +781,21 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -805,9 +805,9 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 </div>
 <div class="section" id="simple-gru2">
 <h4>simple_gru2<a class="headerlink" href="#simple-gru2" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
 Please see grumemory in layers.py for more detail about the maths.
 simple_gru2 is faster than simple_gru.</p>
@@ -820,21 +820,21 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
 <li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>paddle.v2.Activation.Base</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
 <tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">the gru group.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -844,9 +844,9 @@ simple_gru2 is faster than simple_gru.</p>
 </div>
 <div class="section" id="bidirectional-gru">
 <h4>bidirectional_gru<a class="headerlink" href="#bidirectional-gru" title="永久链接至标题">¶</a></h4>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
 sequence both in forward and bardward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
@@ -862,7 +862,7 @@ just add them together.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
 <li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
 concatenated and returned.
@@ -872,10 +872,10 @@ concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -886,9 +886,9 @@ concatenated and returned.</li>
 </div>
 <div class="section" id="simple-attention">
 <h3>simple_attention<a class="headerlink" href="#simple-attention" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Calculate and then return a context vector by attention machanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
@@ -912,18 +912,18 @@ Align and Translate</strong> for more details. The link is as follows:
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
-<li><strong>softmax_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
+<li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
 that is used to produce attention weight</li>
 <li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
-<li><strong>encoded_sequence</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; output of the encoder</li>
-<li><strong>encoded_proj</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; attention weight is computed by a feed forward neural
+<li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
+<li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state
 of previous time step and encoder&#8217;s output.
 encoded_proj is output of the feed-forward network for
 encoder&#8217;s output. Here we pre-compute it outside
 simple_attention for speed consideration.</li>
-<li><strong>decoder_state</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; hidden state of decoder in previous time step</li>
-<li><strong>transform_param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute of the feed-forward
+<li><strong>decoder_state</strong> (<em>LayerOutput</em>) &#8211; hidden state of decoder in previous time step</li>
+<li><strong>transform_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of the feed-forward
 network that takes decoder_state as inputs to
 compute attention weight.</li>
 </ul>
@@ -942,9 +942,9 @@ compute attention weight.</li>
 <h2>Miscs<a class="headerlink" href="#miscs" title="永久链接至标题">¶</a></h2>
 <div class="section" id="dropout-layer">
 <h3>dropout_layer<a class="headerlink" href="#dropout-layer" title="永久链接至标题">¶</a></h3>
-<dl class="class">
+<dl class="function">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.v2.networks.</code><code class="descname">dropout_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.networks.</code><code class="descname">dropout_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>&#64;TODO(yuyang18): Add comments.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />

--- a/develop/doc_cn/api/v2/data.html
+++ b/develop/doc_cn/api/v2/data.html
@@ -192,12 +192,50 @@
 <h1>Data Reader Interface and DataSets<a class="headerlink" href="#data-reader-interface-and-datasets" title="永久链接至标题">¶</a></h1>
 <div class="section" id="datatypes">
 <h2>DataTypes<a class="headerlink" href="#datatypes" title="永久链接至标题">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_array</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
+<li><strong>dim</strong> (<em>int</em>) &#8211; dimension of this vector.</li>
+<li><strong>seq_type</strong> (<em>int</em>) &#8211; sequence type of input.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">An input type object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">InputType</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
+
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_vector</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
-<dd><p>Dense Vector. It means the input feature is dense float vector. For example,
-if the input is an image with 28*28 pixels, the input of Paddle neural
-network should be a dense vector with dimension 784.</p>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />

--- a/develop/doc_cn/design/cluster_train/master_server.html
+++ b/develop/doc_cn/design/cluster_train/master_server.html
@@ -193,7 +193,7 @@
 </div>
 <div class="section" id="task-queue">
 <span id="task-queue"></span><h2>Task Queue<a class="headerlink" href="#task-queue" title="永久链接至标题">¶</a></h2>
-<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
+<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
 <div class="section" id="task-queue-creation">
 <span id="task-queue-creation"></span><h3>Task Queue Creation<a class="headerlink" href="#task-queue-creation" title="永久链接至标题">¶</a></h3>
 <ol>
@@ -204,21 +204,21 @@
 </pre></div>
 </div>
 </li>
-<li><p class="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
-<p>The definition of the block is:</p>
-<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Block</span> <span class="kd">struct</span> <span class="p">{</span>
-    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the block within the file</span>
+<li><p class="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
+<p>The definition of the chunk is:</p>
+<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Chunk</span> <span class="kd">struct</span> <span class="p">{</span>
+    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the chunk within the file</span>
    <span class="nx">Path</span>  <span class="kt">string</span>
-    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// block index</span>
+    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// chunk index</span>
 <span class="p">}</span>
 </pre></div>
 </div>
 </li>
-<li><p class="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
+<li><p class="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
 <p>The definition of the task is:</p>
 <div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Task</span> <span class="kd">struct</span> <span class="p">{</span>
    <span class="nx">Index</span>  <span class="kt">int</span>
-    <span class="nx">Blocks</span> <span class="p">[]</span><span class="nx">Block</span>
+    <span class="nx">Chunks</span> <span class="p">[]</span><span class="nx">Chunk</span>
 <span class="p">}</span>
 </pre></div>
 </div>

--- a/develop/doc_cn/design/cluster_train/pserver_client.html
+++ b/develop/doc_cn/design/cluster_train/pserver_client.html
@@ -233,7 +233,7 @@ name:sparse-n-1
 <div class="highlight-c"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">config_proto</span><span class="p">);</span>
 </pre></div>
 </div>
-<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
+<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return 0. <code class="docutils literal"><span class="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
 <p><img src="./src/pserver_init.png"></p>
 </div>
 </div>
@@ -266,16 +266,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
 <span class="cm"> * only one trainer will be selected to initialize the parameters on</span>
-<span class="cm"> * parameter servers. Other trainers will be blocked until the</span>
-<span class="cm"> * initialization is done, and they need to get the initialized</span>
+<span class="cm"> * parameter servers. Other trainers need to get the initialized</span>
 <span class="cm"> * parameters from parameter servers using @paddle_get_params.</span>
 <span class="cm"> *</span>
-<span class="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
-<span class="cm"> * Protocol Buffers format.</span>
 <span class="cm"> * @return 1 if the trainer is selected to initialize parameter</span>
 <span class="cm"> * servers, otherwise 0.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pserver_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">);</span>

 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_init_param initializes the parameter on parameter</span>
@@ -283,12 +280,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * @param param the parameter to initialize.</span>
 <span class="cm"> * @param param_config_proto the configuration for the parameter.</span>
+<span class="cm"> * @param config_len the length of param_config_proto</span>
 <span class="cm"> * @return 0 if successful, otherwise -1. On failure, the trainer</span>
 <span class="cm"> * needs to restart the entire initialization process (starting from</span>
 <span class="cm"> * @paddle_begin_init_param). Or simply exit the program and wait for</span>
 <span class="cm"> * the cluster management system to restart the trainer.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">params</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">param</span><span class="p">,</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">,</span> <span class="kt">int</span> <span class="n">config_len</span><span class="p">);</span>

 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_finish_init_params tells parameter servers client has</span>
@@ -315,6 +313,9 @@ name:sparse-n-1
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_get_params gets parameters from parameter servers.</span>
 <span class="cm"> *</span>
+<span class="cm"> * paddle_get_params will block until parameters are initialized on</span>
+<span class="cm"> * the parameter servers.</span>
+<span class="cm"> *</span>
 <span class="cm"> * @param names the array of names of the parameters to get.</span>
 <span class="cm"> * @param dst the destination array of parameters to save to.</span>
 <span class="cm"> * @param len the length of the names array and the paddle_parameter</span>

--- a/develop/doc_cn/design/parameters_in_cpp.html
+++ b/develop/doc_cn/design/parameters_in_cpp.html
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Design Doc: The C++ Class Parameters &mdash; PaddlePaddle  文档</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  
+
+  
+  
+        <link rel="index" title="索引"
+              href="../genindex.html"/>
+        <link rel="search" title="搜索" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  文档" href="../index.html"/> 
+
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+
+  
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Folk me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a></li>
+</ul>
+
+        
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  
+  <div class="main-content-wrap">
+
+    
+    <nav class="doc-menu-vertical" role="navigation">
+        
+          
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_cn.html">安装与编译</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_cn.html">PaddlePaddle的Docker容器使用方式</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/ubuntu_install_cn.html">Ubuntu部署PaddlePaddle</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/cmake/build_from_source_cn.html">PaddlePaddle的编译选项</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/concepts/use_concepts_cn.html">基本使用概念</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_cn.html">设置命令行参数</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_cn.html">使用案例</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_cn.html">参数概述</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_cn.html">细节描述</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_cn.html">运行分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_basis_cn.html">Kubernetes 简介</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_cn.html">Kubernetes单机训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_distributed_cn.html">Kubernetes分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/write_docs_cn.html">如何贡献/修改文档</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_cn.html">如何贡献代码</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_cn.html">RNN相关模型</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/recurrent_group_cn.html">Recurrent Group教程</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hierarchical_layer_cn.html">支持双层序列作为输入的Layer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hrnn_rnn_api_compare_cn.html">单双层RNN API对比介绍</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_cn.html">GPU性能分析与调优</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">模型配置</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">数据访问</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">训练与应用</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a></li>
+</ul>
+
+        
+    </nav>
+    
+    <section class="doc-content-wrap">
+
+      
+
+ 
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+      
+    <li>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code></li>
+  </ul>
+</div>
+      
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="design-doc-the-c-class-parameters">
+<span id="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code><a class="headerlink" href="#design-doc-the-c-class-parameters" title="永久链接至标题">¶</a></h1>
+<p><code class="docutils literal"><span class="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <code class="docutils literal"><span class="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <code class="docutils literal"><span class="pre">Parameter</span></code> in <a class="reference internal" href="api.html"><span class="doc">api.md</span></a>.</p>
+<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
+<ul class="simple">
+<li>We just use <code class="docutils literal"><span class="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
+<li>We did not implement share Parameters while training. We just trigger <code class="docutils literal"><span class="pre">memcpy</span></code> when start training.</li>
+</ul>
+<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <code class="docutils literal"><span class="pre">Parameters</span></code>:</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">paddle::Parameter</span></code>. A <code class="docutils literal"><span class="pre">Parameters</span></code> is a container for <code class="docutils literal"><span class="pre">paddle::Parameter</span></code>.
+It is evident that we should use <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> when developing <code class="docutils literal"><span class="pre">Parameters</span></code>.
+However, the <code class="docutils literal"><span class="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
+It contains <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code>, <code class="docutils literal"><span class="pre">serialize/deserialize</span></code>, <code class="docutils literal"><span class="pre">optimize(i.e</span> <span class="pre">SGD)</span></code>, <code class="docutils literal"><span class="pre">randomize/zero</span></code>.
+When we developing <code class="docutils literal"><span class="pre">Parameters</span></code>, we only use <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code> functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
+<li><code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <code class="docutils literal"><span class="pre">paddle::MultiGradientMachine</span></code>, <code class="docutils literal"><span class="pre">paddle::NeuralNetwork</span></code>.
+We should pass <code class="docutils literal"><span class="pre">Parameters</span></code> to <code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> when <code class="docutils literal"><span class="pre">forward/backward</span></code> to avoid <code class="docutils literal"><span class="pre">memcpy</span></code> between topologies.
+Also, we should handle multi-GPU/CPU training, because <code class="docutils literal"><span class="pre">forward</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
+<code class="docutils literal"><span class="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
+<li><code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
+So <code class="docutils literal"><span class="pre">Parameters</span></code> should be used by <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>, and <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code> should optimize <code class="docutils literal"><span class="pre">Parameters</span></code> (by SGD).</li>
+</ol>
+<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
+<ol class="simple">
+<li>Clean <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
+<li>Implementation a <code class="docutils literal"><span class="pre">Parameters</span></code> class. It just stores the <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> inside. Make <code class="docutils literal"><span class="pre">GradientMachine</span></code> uses <code class="docutils literal"><span class="pre">Parameters</span></code> as a class member.</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <code class="docutils literal"><span class="pre">Parameter</span></code> between topologies.
+Because we need share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies, it is <code class="docutils literal"><span class="pre">Parameters</span></code>&#8216;s response to exchange Parameters between GPUs.
+<code class="docutils literal"><span class="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <code class="docutils literal"><span class="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <code class="docutils literal"><span class="pre">Parameters</span></code>.<ul>
+<li>We should use a global function to exchange Parameters between GPUs, not a member function in <code class="docutils literal"><span class="pre">Parameters</span></code>. The <code class="docutils literal"><span class="pre">MultiGradientMachine</span></code> invoke this function, which uses <code class="docutils literal"><span class="pre">Parameters</span></code> as this function inputs.</li>
+<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
+</ul>
+</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> as an argument for <code class="docutils literal"><span class="pre">forward/backward</span></code> function, not a data member for <code class="docutils literal"><span class="pre">GradientMachine</span></code>. For example, <code class="docutils literal"><span class="pre">forward</span></code> could be <code class="docutils literal"><span class="pre">forward(const</span> <span class="pre">Parameters&amp;</span> <span class="pre">params,</span> <span class="pre">...)</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> could be <code class="docutils literal"><span class="pre">backward(Parameters*</span> <span class="pre">params,</span> <span class="pre">...)</span></code>. After this step, Paddle could share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies.</li>
+<li><code class="docutils literal"><span class="pre">ParameterUpdater</span></code> is invoked by <code class="docutils literal"><span class="pre">GradientMachine</span></code> and <code class="docutils literal"><span class="pre">Trainer</span></code>, but it updates <code class="docutils literal"><span class="pre">Parameters</span></code>. In the end of this code refactoring, we could change <code class="docutils literal"><span class="pre">ParameterUpdater</span></code> directly uses <code class="docutils literal"><span class="pre">Parameters</span></code> to make <code class="docutils literal"><span class="pre">ParameterUpdater</span></code>&#8216;s implementation clear.</li>
+</ol>
+</div>
+
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="../_static/translations.js"></script>
+      <script type="text/javascript" src="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"></script>
+       
+  
+
+  
+  
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  
+  
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js