Merge branch 'gh-pages' into gh-pages

42e29e07 · chrisxu2014 · GitHub · 008bcfd2 · 35e6dbc4 · 42e29e07
25 changed file
--- a/404.html
+++ b/404.html
@@ -15,7 +15,6 @@
        .norsTitle {font-size: 22px; font-family: Microsoft Yahei; font-weight: normal; color: #333; margin: 35px 0 25px 0; }
    </style>
 </head>
 <body link="#0000cc">
    <div id="wrapper_wrapper">
        <div id="content_left">

--- a/develop/doc/_sources/design/cluster_train/master_server.md.txt
+++ b/develop/doc/_sources/design/cluster_train/master_server.md.txt
@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c
 ## Task Queue
-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
+As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
 ### Task Queue Creation
@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
 	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
 	}
 	```
-1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
+1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
-	The definition of the block is:
+	The definition of the chunk is:
 	```go
-	type Block struct {
+	type Chunk struct {
-		Idx   int // index of the block within the file
+		Idx   int // index of the chunk within the file
 		Path  string
-		Index recordio.Index // block index
+		Index recordio.Index // chunk index
 	}
 	```
-1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
+1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
 	The definition of the task is:
 	```go
 	type Task struct {
 		Index  int
-		Blocks []Block
+		Chunks []Chunk
 	}
 	```

--- a/develop/doc/_sources/design/cluster_train/pserver_client.md.txt
+++ b/develop/doc/_sources/design/cluster_train/pserver_client.md.txt
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
 ```c
 int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
 ```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
+The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
 <img src="./src/pserver_init.png">
@@ -89,16 +89,13 @@ void paddle_pserver_client_release(paddle_pserver_client* client);
 *
 * paddle_begin_init_params will be called from multiple trainers,
 * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers will be blocked until the
+ * parameter servers. Other trainers need to get the initialized
- * initialization is done, and they need to get the initialized
 * parameters from parameter servers using @paddle_get_params.
 *
- * @param pserver_config_proto serialized parameter server configuration in
- * Protocol Buffers format.
 * @return 1 if the trainer is selected to initialize parameter
 * servers, otherwise 0.
 */
-int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_config_proto);
+int paddle_begin_init_params(paddle_pserver_client* client);
 /**
 * @brief paddle_init_param initializes the parameter on parameter
@@ -106,12 +103,13 @@ int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_
 *
 * @param param the parameter to initialize.
 * @param param_config_proto the configuration for the parameter.
+ * @param config_len the length of param_config_proto
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
 */
-int paddle_init_param(paddle_pserver_client* client, paddle_parameter params, const char* param_config_proto);
+int paddle_init_param(paddle_pserver_client* client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
 /**
 * @brief paddle_finish_init_params tells parameter servers client has
@@ -138,6 +136,9 @@ int paddle_send_grads(paddle_pserver_client* client, const paddle_gradient* grad
 /**
 * @brief paddle_get_params gets parameters from parameter servers.
 *
+ * paddle_get_params will block until parameters are initialized on
+ * the parameter servers.
+ *
 * @param names the array of names of the parameters to get.
 * @param dst the destination array of parameters to save to.
 * @param len the length of the names array and the paddle_parameter

--- a/develop/doc/_sources/design/parameters_in_cpp.md.txt
+++ b/develop/doc/_sources/design/parameters_in_cpp.md.txt
+# Design Doc: The C++ Class `Parameters`
+`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
+We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
+* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
+* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
+It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
+It is evident that we should use `paddle::Parameter` when developing `Parameters`.
+However, the `Parameter` class contains many functions and does not have a clear interface.
+It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
+When we developing `Parameters`, we only use `create/store Parameter` functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
+2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
+We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
+Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
+`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. 
+So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
+1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
+2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
+Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
+`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
+   * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
+   * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
+4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
--- a/develop/doc/api/v2/config/evaluators.html
+++ b/develop/doc/api/v2/config/evaluators.html
--- a/develop/doc/api/v2/config/layer.html
+++ b/develop/doc/api/v2/config/layer.html
--- a/develop/doc/api/v2/config/networks.html
+++ b/develop/doc/api/v2/config/networks.html
--- a/develop/doc/api/v2/data.html
+++ b/develop/doc/api/v2/data.html
@@ -185,12 +185,50 @@
 <h1>Data Reader Interface and DataSets<a class="headerlink" href="#data-reader-interface-and-datasets" title="Permalink to this headline">¶</a></h1>
 <div class="section" id="datatypes">
 <h2>DataTypes<a class="headerlink" href="#datatypes" title="Permalink to this headline">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_array</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>dim</strong> (<em>int</em>) &#8211; dimension of this vector.</li>
+<li><strong>seq_type</strong> (<em>int</em>) &#8211; sequence type of input.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">An input type object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">InputType</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_vector</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
-<dd><p>Dense Vector. It means the input feature is dense float vector. For example,
+<dd><p>Dense Array. It means the input feature is dense array with float type.
-if the input is an image with 28*28 pixels, the input of Paddle neural
+For example, if the input is an image with 28*28 pixels, the input of
-network should be a dense vector with dimension 784.</p>
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />

--- a/develop/doc/design/cluster_train/master_server.html
+++ b/develop/doc/design/cluster_train/master_server.html
@@ -186,7 +186,7 @@
 </div>
 <div class="section" id="task-queue">
 <span id="task-queue"></span><h2>Task Queue<a class="headerlink" href="#task-queue" title="Permalink to this headline">¶</a></h2>
-<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
+<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
 <div class="section" id="task-queue-creation">
 <span id="task-queue-creation"></span><h3>Task Queue Creation<a class="headerlink" href="#task-queue-creation" title="Permalink to this headline">¶</a></h3>
 <ol>
@@ -197,21 +197,21 @@
 </pre></div>
 </div>
 </li>
-<li><p class="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
+<li><p class="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
-<p>The definition of the block is:</p>
+<p>The definition of the chunk is:</p>
-<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Block</span> <span class="kd">struct</span> <span class="p">{</span>
+<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Chunk</span> <span class="kd">struct</span> <span class="p">{</span>
-    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the block within the file</span>
+    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the chunk within the file</span>
    <span class="nx">Path</span>  <span class="kt">string</span>
-    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// block index</span>
+    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// chunk index</span>
 <span class="p">}</span>
 </pre></div>
 </div>
 </li>
-<li><p class="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
+<li><p class="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
 <p>The definition of the task is:</p>
 <div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Task</span> <span class="kd">struct</span> <span class="p">{</span>
    <span class="nx">Index</span>  <span class="kt">int</span>
-    <span class="nx">Blocks</span> <span class="p">[]</span><span class="nx">Block</span>
+    <span class="nx">Chunks</span> <span class="p">[]</span><span class="nx">Chunk</span>
 <span class="p">}</span>
 </pre></div>
 </div>

--- a/develop/doc/design/cluster_train/pserver_client.html
+++ b/develop/doc/design/cluster_train/pserver_client.html
@@ -226,7 +226,7 @@ name:sparse-n-1
 <div class="highlight-c"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">config_proto</span><span class="p">);</span>
 </pre></div>
 </div>
-<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
+<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return 0. <code class="docutils literal"><span class="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
 <p><img src="./src/pserver_init.png"></p>
 </div>
 </div>
@@ -259,16 +259,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
 <span class="cm"> * only one trainer will be selected to initialize the parameters on</span>
-<span class="cm"> * parameter servers. Other trainers will be blocked until the</span>
+<span class="cm"> * parameter servers. Other trainers need to get the initialized</span>
-<span class="cm"> * initialization is done, and they need to get the initialized</span>
 <span class="cm"> * parameters from parameter servers using @paddle_get_params.</span>
 <span class="cm"> *</span>
-<span class="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
-<span class="cm"> * Protocol Buffers format.</span>
 <span class="cm"> * @return 1 if the trainer is selected to initialize parameter</span>
 <span class="cm"> * servers, otherwise 0.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pserver_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">);</span>
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_init_param initializes the parameter on parameter</span>
@@ -276,12 +273,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * @param param the parameter to initialize.</span>
 <span class="cm"> * @param param_config_proto the configuration for the parameter.</span>
+<span class="cm"> * @param config_len the length of param_config_proto</span>
 <span class="cm"> * @return 0 if successful, otherwise -1. On failure, the trainer</span>
 <span class="cm"> * needs to restart the entire initialization process (starting from</span>
 <span class="cm"> * @paddle_begin_init_param). Or simply exit the program and wait for</span>
 <span class="cm"> * the cluster management system to restart the trainer.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">params</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">param</span><span class="p">,</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">,</span> <span class="kt">int</span> <span class="n">config_len</span><span class="p">);</span>
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_finish_init_params tells parameter servers client has</span>
@@ -308,6 +306,9 @@ name:sparse-n-1
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_get_params gets parameters from parameter servers.</span>
 <span class="cm"> *</span>
+<span class="cm"> * paddle_get_params will block until parameters are initialized on</span>
+<span class="cm"> * the parameter servers.</span>
+<span class="cm"> *</span>
 <span class="cm"> * @param names the array of names of the parameters to get.</span>
 <span class="cm"> * @param dst the destination array of parameters to save to.</span>
 <span class="cm"> * @param len the length of the names array and the paddle_parameter</span>

--- a/develop/doc/design/parameters_in_cpp.html
+++ b/develop/doc/design/parameters_in_cpp.html
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Design Doc: The C++ Class Parameters &mdash; PaddlePaddle  documentation</title>
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+        <link rel="index" title="Index"
+              href="../genindex.html"/>
+        <link rel="search" title="Search" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  documentation" href="../index.html"/> 
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+  <script src="../_static/js/modernizr.min.js"></script>
+</head>
+<body class="wy-body-for-nav" role="document">
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Folk me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../about/index_en.html">ABOUT</a></li>
+</ul>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  <div class="main-content-wrap">
+    <nav class="doc-menu-vertical" role="navigation">
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/ubuntu_install_en.html">Debian Package installation guide</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/build_from_source_en.html">Installing from Sources</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_en.html">Set Command-line Parameters</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_en.html">Use Case</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_en.html">Argument Outline</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_en.html">Detail Description</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_en.html">Run Distributed Training</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_en.html">Contribute Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_en.html">RNN Models</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_en.html">Tune GPU Performance</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">Model Configuration</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">Data Reader Interface and DataSets</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">Training and Inference</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../about/index_en.html">ABOUT</a></li>
+</ul>
+    </nav>
+    <section class="doc-content-wrap">
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code></li>
+  </ul>
+</div>
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+  <div class="section" id="design-doc-the-c-class-parameters">
+<span id="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code><a class="headerlink" href="#design-doc-the-c-class-parameters" title="Permalink to this headline">¶</a></h1>
+<p><code class="docutils literal"><span class="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <code class="docutils literal"><span class="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <code class="docutils literal"><span class="pre">Parameter</span></code> in <a class="reference internal" href="api.html"><span class="doc">api.md</span></a>.</p>
+<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
+<ul class="simple">
+<li>We just use <code class="docutils literal"><span class="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
+<li>We did not implement share Parameters while training. We just trigger <code class="docutils literal"><span class="pre">memcpy</span></code> when start training.</li>
+</ul>
+<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <code class="docutils literal"><span class="pre">Parameters</span></code>:</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">paddle::Parameter</span></code>. A <code class="docutils literal"><span class="pre">Parameters</span></code> is a container for <code class="docutils literal"><span class="pre">paddle::Parameter</span></code>.
+It is evident that we should use <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> when developing <code class="docutils literal"><span class="pre">Parameters</span></code>.
+However, the <code class="docutils literal"><span class="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
+It contains <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code>, <code class="docutils literal"><span class="pre">serialize/deserialize</span></code>, <code class="docutils literal"><span class="pre">optimize(i.e</span> <span class="pre">SGD)</span></code>, <code class="docutils literal"><span class="pre">randomize/zero</span></code>.
+When we developing <code class="docutils literal"><span class="pre">Parameters</span></code>, we only use <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code> functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
+<li><code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <code class="docutils literal"><span class="pre">paddle::MultiGradientMachine</span></code>, <code class="docutils literal"><span class="pre">paddle::NeuralNetwork</span></code>.
+We should pass <code class="docutils literal"><span class="pre">Parameters</span></code> to <code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> when <code class="docutils literal"><span class="pre">forward/backward</span></code> to avoid <code class="docutils literal"><span class="pre">memcpy</span></code> between topologies.
+Also, we should handle multi-GPU/CPU training, because <code class="docutils literal"><span class="pre">forward</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
+<code class="docutils literal"><span class="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
+<li><code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
+So <code class="docutils literal"><span class="pre">Parameters</span></code> should be used by <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>, and <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code> should optimize <code class="docutils literal"><span class="pre">Parameters</span></code> (by SGD).</li>
+</ol>
+<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
+<ol class="simple">
+<li>Clean <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
+<li>Implementation a <code class="docutils literal"><span class="pre">Parameters</span></code> class. It just stores the <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> inside. Make <code class="docutils literal"><span class="pre">GradientMachine</span></code> uses <code class="docutils literal"><span class="pre">Parameters</span></code> as a class member.</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <code class="docutils literal"><span class="pre">Parameter</span></code> between topologies.
+Because we need share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies, it is <code class="docutils literal"><span class="pre">Parameters</span></code>&#8216;s response to exchange Parameters between GPUs.
+<code class="docutils literal"><span class="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <code class="docutils literal"><span class="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <code class="docutils literal"><span class="pre">Parameters</span></code>.<ul>
+<li>We should use a global function to exchange Parameters between GPUs, not a member function in <code class="docutils literal"><span class="pre">Parameters</span></code>. The <code class="docutils literal"><span class="pre">MultiGradientMachine</span></code> invoke this function, which uses <code class="docutils literal"><span class="pre">Parameters</span></code> as this function inputs.</li>
+<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
+</ul>
+</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> as an argument for <code class="docutils literal"><span class="pre">forward/backward</span></code> function, not a data member for <code class="docutils literal"><span class="pre">GradientMachine</span></code>. For example, <code class="docutils literal"><span class="pre">forward</span></code> could be <code class="docutils literal"><span class="pre">forward(const</span> <span class="pre">Parameters&amp;</span> <span class="pre">params,</span> <span class="pre">...)</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> could be <code class="docutils literal"><span class="pre">backward(Parameters*</span> <span class="pre">params,</span> <span class="pre">...)</span></code>. After this step, Paddle could share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies.</li>
+<li><code class="docutils literal"><span class="pre">ParameterUpdater</span></code> is invoked by <code class="docutils literal"><span class="pre">GradientMachine</span></code> and <code class="docutils literal"><span class="pre">Trainer</span></code>, but it updates <code class="docutils literal"><span class="pre">Parameters</span></code>. In the end of this code refactoring, we could change <code class="docutils literal"><span class="pre">ParameterUpdater</span></code> directly uses <code class="docutils literal"><span class="pre">Parameters</span></code> to make <code class="docutils literal"><span class="pre">ParameterUpdater</span></code>&#8216;s implementation clear.</li>
+</ol>
+</div>
+           </div>
+          </div>
+          <footer>
+  <hr/>
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt
+++ b/develop/doc_cn/_sources/design/cluster_train/master_server.md.txt
@@ -10,7 +10,7 @@ A dataset is a list of files in *RecordIO* format. A RecordIO file consists of c
 ## Task Queue
-As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *blocks* from one or multiple files. The master server maintains *task queues* to track the training progress.
+As mentioned in [distributed training design doc](./README.md), a *task* is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple *chunks* from one or multiple files. The master server maintains *task queues* to track the training progress.
 ### Task Queue Creation
@@ -21,23 +21,23 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da
 	func (m *RPCServer) ReportDataset(Paths []string, dummy *int) error {
 	}
 	```
-1. The master server will scan through each RecordIO file to generate the *block index* and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.
+1. The master server will scan through each RecordIO file to generate the *chunk index* and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.
-	The definition of the block is:
+	The definition of the chunk is:
 	```go
-	type Block struct {
+	type Chunk struct {
-		Idx   int // index of the block within the file
+		Idx   int // index of the chunk within the file
 		Path  string
-		Index recordio.Index // block index
+		Index recordio.Index // chunk index
 	}
 	```
-1. Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
+1. Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.
 	The definition of the task is:
 	```go
 	type Task struct {
 		Index  int
-		Blocks []Block
+		Chunks []Chunk
 	}
 	```

--- a/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt
+++ b/develop/doc_cn/_sources/design/cluster_train/pserver_client.md.txt
@@ -55,7 +55,7 @@ The trainer select process is encapsulated in the C API function:
 ```c
 int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
 ```
-The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
+The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will return 0. `paddle_get_params` will be blocked until initialization is completed. As illustrated below:
 <img src="./src/pserver_init.png">
@@ -89,16 +89,13 @@ void paddle_pserver_client_release(paddle_pserver_client* client);
 *
 * paddle_begin_init_params will be called from multiple trainers,
 * only one trainer will be selected to initialize the parameters on
- * parameter servers. Other trainers will be blocked until the
+ * parameter servers. Other trainers need to get the initialized
- * initialization is done, and they need to get the initialized
 * parameters from parameter servers using @paddle_get_params.
 *
- * @param pserver_config_proto serialized parameter server configuration in
- * Protocol Buffers format.
 * @return 1 if the trainer is selected to initialize parameter
 * servers, otherwise 0.
 */
-int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_config_proto);
+int paddle_begin_init_params(paddle_pserver_client* client);
 /**
 * @brief paddle_init_param initializes the parameter on parameter
@@ -106,12 +103,13 @@ int paddle_begin_init_params(paddle_pserver_client* client, const char* pserver_
 *
 * @param param the parameter to initialize.
 * @param param_config_proto the configuration for the parameter.
+ * @param config_len the length of param_config_proto
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
 */
-int paddle_init_param(paddle_pserver_client* client, paddle_parameter params, const char* param_config_proto);
+int paddle_init_param(paddle_pserver_client* client, paddle_parameter param, const unsigned char* param_config_proto, int config_len);
 /**
 * @brief paddle_finish_init_params tells parameter servers client has
@@ -138,6 +136,9 @@ int paddle_send_grads(paddle_pserver_client* client, const paddle_gradient* grad
 /**
 * @brief paddle_get_params gets parameters from parameter servers.
 *
+ * paddle_get_params will block until parameters are initialized on
+ * the parameter servers.
+ *
 * @param names the array of names of the parameters to get.
 * @param dst the destination array of parameters to save to.
 * @param len the length of the names array and the paddle_parameter

--- a/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt
+++ b/develop/doc_cn/_sources/design/parameters_in_cpp.md.txt
+# Design Doc: The C++ Class `Parameters`
+`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md).
+We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:
+* We just use `memcpy` to share Parameters between topologies, but this is very inefficient. 
+* We did not implement share Parameters while training. We just trigger `memcpy` when start training.
+It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`:
+1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`.
+It is evident that we should use `paddle::Parameter` when developing `Parameters`.
+However, the `Parameter` class contains many functions and does not have a clear interface.
+It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`.
+When we developing `Parameters`, we only use `create/store Parameter` functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.
+2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`.
+We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies.
+Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs.
+`Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device.
+3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. 
+So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD).
+The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.
+1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters.
+2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member.
+3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies.
+Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs.
+`GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`.
+   * We should use a global function to exchange Parameters between GPUs, not a member function in `Parameters`. The `MultiGradientMachine` invoke this function, which uses `Parameters` as this function inputs.
+   * The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.
+4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies.
+5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear.
--- a/develop/doc_cn/api/v2/config/evaluators.html
+++ b/develop/doc_cn/api/v2/config/evaluators.html
--- a/develop/doc_cn/api/v2/config/layer.html
+++ b/develop/doc_cn/api/v2/config/layer.html
--- a/develop/doc_cn/api/v2/config/networks.html
+++ b/develop/doc_cn/api/v2/config/networks.html
--- a/develop/doc_cn/api/v2/data.html
+++ b/develop/doc_cn/api/v2/data.html
@@ -192,12 +192,50 @@
 <h1>Data Reader Interface and DataSets<a class="headerlink" href="#data-reader-interface-and-datasets" title="永久链接至标题">¶</a></h1>
 <div class="section" id="datatypes">
 <h2>DataTypes<a class="headerlink" href="#datatypes" title="永久链接至标题">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_array</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
+<dd><p>Dense Array. It means the input feature is dense array with float type.
+For example, if the input is an image with 28*28 pixels, the input of
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
+<li><strong>dim</strong> (<em>int</em>) &#8211; dimension of this vector.</li>
+<li><strong>seq_type</strong> (<em>int</em>) &#8211; sequence type of input.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">An input type object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">InputType</p>
+</td>
+</tr>
+</tbody>
+</table>
+</dd></dl>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.data_type.</code><code class="descname">dense_vector</code><span class="sig-paren">(</span><em>dim</em>, <em>seq_type=0</em><span class="sig-paren">)</span></dt>
-<dd><p>Dense Vector. It means the input feature is dense float vector. For example,
+<dd><p>Dense Array. It means the input feature is dense array with float type.
-if the input is an image with 28*28 pixels, the input of Paddle neural
+For example, if the input is an image with 28*28 pixels, the input of
-network should be a dense vector with dimension 784.</p>
+Paddle neural network could be a dense vector with dimension 784 or a
+numpy array with shape (28, 28).</p>
+<p>For the 2-D convolution operation, each sample in one mini-batch must have
+the similarly size in PaddlePaddle now. But, it supports variable-dimension
+feature across mini-batch. For the variable-dimension, the param dim is not
+used. While the data reader must yield numpy array and the data feeder will
+set the data shape correctly.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />

--- a/develop/doc_cn/design/cluster_train/master_server.html
+++ b/develop/doc_cn/design/cluster_train/master_server.html
@@ -193,7 +193,7 @@
 </div>
 <div class="section" id="task-queue">
 <span id="task-queue"></span><h2>Task Queue<a class="headerlink" href="#task-queue" title="永久链接至标题">¶</a></h2>
-<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>blocks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
+<p>As mentioned in <a class="reference internal" href="README.html"><span class="doc">distributed training design doc</span></a>, a <em>task</em> is a data shard that the master server assigns to the trainer process to train on. A task consists of one or multiple <em>chunks</em> from one or multiple files. The master server maintains <em>task queues</em> to track the training progress.</p>
 <div class="section" id="task-queue-creation">
 <span id="task-queue-creation"></span><h3>Task Queue Creation<a class="headerlink" href="#task-queue-creation" title="永久链接至标题">¶</a></h3>
 <ol>
@@ -204,21 +204,21 @@
 </pre></div>
 </div>
 </li>
-<li><p class="first">The master server will scan through each RecordIO file to generate the <em>block index</em> and know how many blocks does each file have. A block can be referenced by the file path and the index of the block within the file. The block index is in memory data structure that enables fast access to each block, and the index of the block with the file is an integer start from 0, representing the n-th block within the file.</p>
+<li><p class="first">The master server will scan through each RecordIO file to generate the <em>chunk index</em> and know how many chunks does each file have. A chunk can be referenced by the file path and the index of the chunk within the file. The chunk index is in memory data structure that enables fast access to each chunk, and the index of the chunk with the file is an integer start from 0, representing the n-th chunk within the file.</p>
-<p>The definition of the block is:</p>
+<p>The definition of the chunk is:</p>
-<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Block</span> <span class="kd">struct</span> <span class="p">{</span>
+<div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Chunk</span> <span class="kd">struct</span> <span class="p">{</span>
-    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the block within the file</span>
+    <span class="nx">Idx</span>   <span class="kt">int</span> <span class="c1">// index of the chunk within the file</span>
    <span class="nx">Path</span>  <span class="kt">string</span>
-    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// block index</span>
+    <span class="nx">Index</span> <span class="nx">recordio</span><span class="p">.</span><span class="nx">Index</span> <span class="c1">// chunk index</span>
 <span class="p">}</span>
 </pre></div>
 </div>
 </li>
-<li><p class="first">Blocks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
+<li><p class="first">Chunks are grouped into tasks, and tasks are filled into the todo queue. The pending queue and the done queue are initialized with no element.</p>
 <p>The definition of the task is:</p>
 <div class="highlight-go"><div class="highlight"><pre><span></span><span class="kd">type</span> <span class="nx">Task</span> <span class="kd">struct</span> <span class="p">{</span>
    <span class="nx">Index</span>  <span class="kt">int</span>
-    <span class="nx">Blocks</span> <span class="p">[]</span><span class="nx">Block</span>
+    <span class="nx">Chunks</span> <span class="p">[]</span><span class="nx">Chunk</span>
 <span class="p">}</span>
 </pre></div>
 </div>

--- a/develop/doc_cn/design/cluster_train/pserver_client.html
+++ b/develop/doc_cn/design/cluster_train/pserver_client.html
@@ -233,7 +233,7 @@ name:sparse-n-1
 <div class="highlight-c"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">config_proto</span><span class="p">);</span>
 </pre></div>
 </div>
-<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will block until initialization is done, and return 0. As illustrated below:</p>
+<p>The selected trainer&#8217;s call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return with 1, and the other trainers&#8217; call to <code class="docutils literal"><span class="pre">paddle_begin_init_params</span></code> will return 0. <code class="docutils literal"><span class="pre">paddle_get_params</span></code> will be blocked until initialization is completed. As illustrated below:</p>
 <p><img src="./src/pserver_init.png"></p>
 </div>
 </div>
@@ -266,16 +266,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * paddle_begin_init_params will be called from multiple trainers,</span>
 <span class="cm"> * only one trainer will be selected to initialize the parameters on</span>
-<span class="cm"> * parameter servers. Other trainers will be blocked until the</span>
+<span class="cm"> * parameter servers. Other trainers need to get the initialized</span>
-<span class="cm"> * initialization is done, and they need to get the initialized</span>
 <span class="cm"> * parameters from parameter servers using @paddle_get_params.</span>
 <span class="cm"> *</span>
-<span class="cm"> * @param pserver_config_proto serialized parameter server configuration in</span>
-<span class="cm"> * Protocol Buffers format.</span>
 <span class="cm"> * @return 1 if the trainer is selected to initialize parameter</span>
 <span class="cm"> * servers, otherwise 0.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pserver_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_begin_init_params</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">);</span>
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_init_param initializes the parameter on parameter</span>
@@ -283,12 +280,13 @@ name:sparse-n-1
 <span class="cm"> *</span>
 <span class="cm"> * @param param the parameter to initialize.</span>
 <span class="cm"> * @param param_config_proto the configuration for the parameter.</span>
+<span class="cm"> * @param config_len the length of param_config_proto</span>
 <span class="cm"> * @return 0 if successful, otherwise -1. On failure, the trainer</span>
 <span class="cm"> * needs to restart the entire initialization process (starting from</span>
 <span class="cm"> * @paddle_begin_init_param). Or simply exit the program and wait for</span>
 <span class="cm"> * the cluster management system to restart the trainer.</span>
 <span class="cm"> */</span>
-<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">params</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">);</span>
+<span class="kt">int</span> <span class="nf">paddle_init_param</span><span class="p">(</span><span class="n">paddle_pserver_client</span><span class="o">*</span> <span class="n">client</span><span class="p">,</span> <span class="n">paddle_parameter</span> <span class="n">param</span><span class="p">,</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span><span class="o">*</span> <span class="n">param_config_proto</span><span class="p">,</span> <span class="kt">int</span> <span class="n">config_len</span><span class="p">);</span>
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_finish_init_params tells parameter servers client has</span>
@@ -315,6 +313,9 @@ name:sparse-n-1
 <span class="cm">/**</span>
 <span class="cm"> * @brief paddle_get_params gets parameters from parameter servers.</span>
 <span class="cm"> *</span>
+<span class="cm"> * paddle_get_params will block until parameters are initialized on</span>
+<span class="cm"> * the parameter servers.</span>
+<span class="cm"> *</span>
 <span class="cm"> * @param names the array of names of the parameters to get.</span>
 <span class="cm"> * @param dst the destination array of parameters to save to.</span>
 <span class="cm"> * @param len the length of the names array and the paddle_parameter</span>

--- a/develop/doc_cn/design/parameters_in_cpp.html
+++ b/develop/doc_cn/design/parameters_in_cpp.html
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Design Doc: The C++ Class Parameters &mdash; PaddlePaddle  文档</title>
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+        <link rel="index" title="索引"
+              href="../genindex.html"/>
+        <link rel="search" title="搜索" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  文档" href="../index.html"/> 
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+  <script src="../_static/js/modernizr.min.js"></script>
+</head>
+<body class="wy-body-for-nav" role="document">
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Folk me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a></li>
+</ul>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  <div class="main-content-wrap">
+    <nav class="doc-menu-vertical" role="navigation">
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_cn.html">安装与编译</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_cn.html">PaddlePaddle的Docker容器使用方式</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/ubuntu_install_cn.html">Ubuntu部署PaddlePaddle</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/cmake/build_from_source_cn.html">PaddlePaddle的编译选项</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/concepts/use_concepts_cn.html">基本使用概念</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_cn.html">设置命令行参数</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_cn.html">使用案例</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_cn.html">参数概述</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_cn.html">细节描述</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_cn.html">运行分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_basis_cn.html">Kubernetes 简介</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_cn.html">Kubernetes单机训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_distributed_cn.html">Kubernetes分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/write_docs_cn.html">如何贡献/修改文档</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_cn.html">如何贡献代码</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_cn.html">RNN相关模型</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/recurrent_group_cn.html">Recurrent Group教程</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hierarchical_layer_cn.html">支持双层序列作为输入的Layer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hrnn_rnn_api_compare_cn.html">单双层RNN API对比介绍</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_cn.html">GPU性能分析与调优</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">模型配置</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">数据访问</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">训练与应用</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a></li>
+</ul>
+    </nav>
+    <section class="doc-content-wrap">
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code></li>
+  </ul>
+</div>
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+  <div class="section" id="design-doc-the-c-class-parameters">
+<span id="design-doc-the-c-class-parameters"></span><h1>Design Doc: The C++ Class <code class="docutils literal"><span class="pre">Parameters</span></code><a class="headerlink" href="#design-doc-the-c-class-parameters" title="永久链接至标题">¶</a></h1>
+<p><code class="docutils literal"><span class="pre">Parameters</span></code> is a concept we designed in Paddle V2 API. <code class="docutils literal"><span class="pre">Parameters</span></code> is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of <code class="docutils literal"><span class="pre">Parameter</span></code> in <a class="reference internal" href="api.html"><span class="doc">api.md</span></a>.</p>
+<p>We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation:</p>
+<ul class="simple">
+<li>We just use <code class="docutils literal"><span class="pre">memcpy</span></code> to share Parameters between topologies, but this is very inefficient.</li>
+<li>We did not implement share Parameters while training. We just trigger <code class="docutils literal"><span class="pre">memcpy</span></code> when start training.</li>
+</ul>
+<p>It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with <code class="docutils literal"><span class="pre">Parameters</span></code>:</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">paddle::Parameter</span></code>. A <code class="docutils literal"><span class="pre">Parameters</span></code> is a container for <code class="docutils literal"><span class="pre">paddle::Parameter</span></code>.
+It is evident that we should use <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> when developing <code class="docutils literal"><span class="pre">Parameters</span></code>.
+However, the <code class="docutils literal"><span class="pre">Parameter</span></code> class contains many functions and does not have a clear interface.
+It contains <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code>, <code class="docutils literal"><span class="pre">serialize/deserialize</span></code>, <code class="docutils literal"><span class="pre">optimize(i.e</span> <span class="pre">SGD)</span></code>, <code class="docutils literal"><span class="pre">randomize/zero</span></code>.
+When we developing <code class="docutils literal"><span class="pre">Parameters</span></code>, we only use <code class="docutils literal"><span class="pre">create/store</span> <span class="pre">Parameter</span></code> functionality.
+We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation.</li>
+<li><code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> and its sub-classes, e.g., <code class="docutils literal"><span class="pre">paddle::MultiGradientMachine</span></code>, <code class="docutils literal"><span class="pre">paddle::NeuralNetwork</span></code>.
+We should pass <code class="docutils literal"><span class="pre">Parameters</span></code> to <code class="docutils literal"><span class="pre">paddle::GradientMachine</span></code> when <code class="docutils literal"><span class="pre">forward/backward</span></code> to avoid <code class="docutils literal"><span class="pre">memcpy</span></code> between topologies.
+Also, we should handle multi-GPU/CPU training, because <code class="docutils literal"><span class="pre">forward</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> would perform on multi-GPUs and multi-CPUs.
+<code class="docutils literal"><span class="pre">Parameters</span></code> should dispatch the parameter value to each device, and gather the parameter gradient from each device.</li>
+<li><code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>. The ParameterUpdater is used to update parameters in Paddle.
+So <code class="docutils literal"><span class="pre">Parameters</span></code> should be used by <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code>, and <code class="docutils literal"><span class="pre">paddle::ParameterUpdater</span></code> should optimize <code class="docutils literal"><span class="pre">Parameters</span></code> (by SGD).</li>
+</ol>
+<p>The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one.</p>
+<ol class="simple">
+<li>Clean <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> interface. Extract the functionalities of <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> to prepare for the implementation of Parameters.</li>
+<li>Implementation a <code class="docutils literal"><span class="pre">Parameters</span></code> class. It just stores the <code class="docutils literal"><span class="pre">paddle::Parameter</span></code> inside. Make <code class="docutils literal"><span class="pre">GradientMachine</span></code> uses <code class="docutils literal"><span class="pre">Parameters</span></code> as a class member.</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> support Multi-CPU and Multi-GPU training to prepare for sharing <code class="docutils literal"><span class="pre">Parameter</span></code> between topologies.
+Because we need share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies, it is <code class="docutils literal"><span class="pre">Parameters</span></code>&#8216;s response to exchange Parameters between GPUs.
+<code class="docutils literal"><span class="pre">GradientMachine</span></code> should not handle how to exchange Parameters because <code class="docutils literal"><span class="pre">GradientMachine</span></code> only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one <code class="docutils literal"><span class="pre">Parameters</span></code>.<ul>
+<li>We should use a global function to exchange Parameters between GPUs, not a member function in <code class="docutils literal"><span class="pre">Parameters</span></code>. The <code class="docutils literal"><span class="pre">MultiGradientMachine</span></code> invoke this function, which uses <code class="docutils literal"><span class="pre">Parameters</span></code> as this function inputs.</li>
+<li>The MultiGradientMachine contains many functionalities. Extracting the Parameters exchanging logic could make MultiGradientMachine clearer and simpler.</li>
+</ul>
+</li>
+<li>Make <code class="docutils literal"><span class="pre">Parameters</span></code> as an argument for <code class="docutils literal"><span class="pre">forward/backward</span></code> function, not a data member for <code class="docutils literal"><span class="pre">GradientMachine</span></code>. For example, <code class="docutils literal"><span class="pre">forward</span></code> could be <code class="docutils literal"><span class="pre">forward(const</span> <span class="pre">Parameters&amp;</span> <span class="pre">params,</span> <span class="pre">...)</span></code> and <code class="docutils literal"><span class="pre">backward</span></code> could be <code class="docutils literal"><span class="pre">backward(Parameters*</span> <span class="pre">params,</span> <span class="pre">...)</span></code>. After this step, Paddle could share <code class="docutils literal"><span class="pre">Parameters</span></code> between topologies.</li>
+<li><code class="docutils literal"><span class="pre">ParameterUpdater</span></code> is invoked by <code class="docutils literal"><span class="pre">GradientMachine</span></code> and <code class="docutils literal"><span class="pre">Trainer</span></code>, but it updates <code class="docutils literal"><span class="pre">Parameters</span></code>. In the end of this code refactoring, we could change <code class="docutils literal"><span class="pre">ParameterUpdater</span></code> directly uses <code class="docutils literal"><span class="pre">Parameters</span></code> to make <code class="docutils literal"><span class="pre">ParameterUpdater</span></code>&#8216;s implementation clear.</li>
+</ol>
+</div>
+           </div>
+          </div>
+          <footer>
+  <hr/>
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="../_static/translations.js"></script>
+      <script type="text/javascript" src="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"></script>
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js