add hapi readme and delete unused file in beginners_guide (#2132)

* add hap readme, test=develop * add hap readme, test=develop

add hapi readme and delete unused file in beginners_guide (#2132)
* add hap readme, test=develop * add hap readme, test=develop
685da70d · Double_V · GitHub · 8447da90 · 8447da90 · 8447da90
61 changed file
--- a/doc/fluid/beginners_guide/basic_concept/dygraph/DyGraph.md
+++ b/doc/fluid/beginners_guide/basic_concept/dygraph/DyGraph.md
-# 命令式编程模式(动态图)机制-DyGraph
-PaddlePaddle的DyGraph模式是一种动态的图执行机制，可以立即执行结果，无需构建整个图。同时，和以往静态的执行计算图不同，DyGraph模式下您的所有操作可以立即获得执行结果，而不必等待所构建的计算图全部执行完成，这样可以让您更加直观地构建PaddlePaddle下的深度学习任务，以及进行模型的调试，同时还减少了大量用于构建静态计算图的代码，使得您编写、调试网络的过程变得更加便捷。
-PaddlePaddle DyGraph是一个更加灵活易用的模式，可提供：  
-* 更加灵活便捷的代码组织结构：使用python的执行控制流程和面向对象的模型设计
-* 更加便捷的调试功能：直接使用python的打印方法即时打印所需要的结果，从而检查正在运行的模型结果便于测试更改
-* 和静态执行图通用的模型代码：同样的模型代码可以使用更加便捷的DyGraph调试，执行，同时也支持使用原有的声明式编程模式(静态图)模式执行
-有关的命令式编程模式机制更多的实际模型示例请参考[Paddle/models/dygraph](https://github.com/PaddlePaddle/models/tree/develop/dygraph)
-## 设置和基本用法
-1. 升级到最新的PaddlePaddle 1.6.0:
-    ```
-    pip install -q --upgrade paddlepaddle==1.6.0
-    ```
-2. 使用`fluid.dygraph.guard(place=None)` 上下文：
-    ```python
-    import paddle.fluid as fluid
-    with fluid.dygraph.guard():
-        # write your executable dygraph code here  
-    ```
-现在您就可以在`fluid.dygraph.guard()`上下文环境中使用DyGraph的模式运行网络了，DyGraph将改变以往PaddlePaddle的执行方式： 现在他们将会立即执行，并且将计算结果返回给Python。
-Dygraph将非常适合和Numpy一起使用，使用`fluid.dygraph.to_variable(x)`将会将ndarray转换为`fluid.Variable`，而使用`fluid.Variable.numpy()`将可以把任意时刻获取到的计算结果转换为Numpy`ndarray`：  
-```python
-x = np.ones([2, 2], np.float32)
-with fluid.dygraph.guard():
-    inputs = []
-    for _ in range(10):
-        inputs.append(fluid.dygraph.to_variable(x))
-    ret = fluid.layers.sums(inputs)
-    print(ret.numpy())
-```
-得到输出：
-```
-[[10. 10.]
-[10. 10.]]
-```
->    这里创建了一系列`ndarray`的输入，执行了一个`sum`操作之后，我们可以直接将运行的结果打印出来
-然后通过调用`reduce_sum`后使用`Variable.backward()`方法执行反向，使用`Variable.gradient()`方法即可获得反向网络执行完成后的梯度值的`ndarray`形式：
-```python
-loss = fluid.layers.reduce_sum(ret)
-loss.backward()
-print(loss.gradient())
-```
-得到输出 ：
-```
-[1.]
-```
-## 基于DyGraph构建网络
-1. 编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下**两部分**组成： **请注意，如果您设计的这一层结构是包含参数的，则必须要使用继承自`fluid.dygraph.Layer`的Object-Oriented-Designed的类来描述该层的行为。**
-    1. 建立一个可以在DyGraph模式中执行的，Object-Oriented的网络，需要继承自`fluid.dygraph.Layer`，其中需要调用基类的`__init__`方法，在构造函数中，我们通常会执行一些例如参数初始化，子网络初始化的操作，执行这些操作时不依赖于输入的动态信息:
-        ```python
-        class MyLayer(fluid.dygraph.Layer):
-            def __init__(self, input_size):
-                super(MyLayer, self).__init__()
-                self.linear = fluid.dygraph.nn.Linear(input_size, 12)
-        ```
-    2. 实现一个`forward(self, *inputs)`的执行函数，该函数将负责执行实际运行时网络的执行逻辑， 该函数将会在每一轮训练/预测中被调用，这里我们将执行一个简单的 `linear` -> `relu` -> `elementwise add` -> `reduce sum`：
-        ```python
-            def forward(self, inputs):
-                x = self.linear(inputs)
-                x = fluid.layers.relu(inputs)
-                self._x_for_debug = x
-                x = fluid.layers.elementwise_mul(x, x)
-                x = fluid.layers.reduce_sum(x)
-                return [x]
-        ```
-2. 在`fluid.dygraph.guard()`中执行：
-    1. 使用Numpy构建输入：
-        ```python
-        np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
-        ```
-    2. 转换输入的`ndarray`为`Variable`, 并执行前向网络获取返回值： 使用`fluid.dygraph.to_variable(np_inp)`转换Numpy输入为DyGraph接收的输入，然后使用`my_layer(var_inp)[0]`调用callable object并且获取了`x`作为返回值，利用`x.numpy()`方法直接获取了执行得到的`x`的`ndarray`返回值。
-        ```python
-        with fluid.dygraph.guard():
-            var_inp = fluid.dygraph.to_variable(np_inp)
-            my_layer = MyLayer(np_inp.shape[-1])
-            x = my_layer(var_inp)[0]
-            dy_out = x.numpy()
-        ```
-    3. 计算梯度：自动微分对于实现机器学习算法（例如用于训练神经网络的反向传播）来说很有用， 使用`x.backward()`方法可以从某个`fluid.Varaible`开始执行反向网络，同时利用`my_layer._x_for_debug.gradient()`获取了网络中`x`梯度的`ndarray` 返回值：
-        ```python
-            x.backward()
-            dy_grad = my_layer._x_for_debug.gradient()
-        ```
-完整代码如下：
-```python
-import paddle.fluid as fluid
-import numpy as np
-class MyLayer(fluid.dygraph.Layer):
-    def __init__(self, input_size):
-        super(MyLayer, self).__init__()
-        self.linear = fluid.dygraph.nn.Linear(input_size, 12)
-    def forward(self, inputs):
-        x = self.linear(inputs)
-        x = fluid.layers.relu(x)
-        self._x_for_debug = x
-        x = fluid.layers.elementwise_mul(x, x)
-        x = fluid.layers.reduce_sum(x)
-        return [x]
-if __name__ == '__main__':
-    np_inp = np.array([[1.0, 2.0, -1.0]], dtype=np.float32)
-    with fluid.dygraph.guard():
-        var_inp = fluid.dygraph.to_variable(np_inp)
-        my_layer = MyLayer(np_inp.shape[-1])
-        x = my_layer(var_inp)[0]
-        dy_out = x.numpy()
-        x.backward()
-        dy_grad = my_layer._x_for_debug.gradient()
-        my_layer.clear_gradients()  # 将参数梯度清零以保证下一轮训练的正确性
-```
-### 关于自动剪枝
-每个 ``Variable`` 都有一个 ``stop_gradient`` 属性，可以用于细粒度地在反向梯度计算时排除部分子图，以提高效率。
-如果OP只要有一个输入需要梯度，那么该OP的输出也需要梯度。
-相反，只有当OP的所有输入都不需要梯度时，该OP的输出也不需要梯度。
-在所有的 ``Variable`` 都不需要梯度的子图中，反向计算就不会进行计算了。
-在命令式编程模式模式下，除参数以外的所有 ``Variable`` 的 ``stop_gradient`` 属性默认值都为 ``True``，而参数的 ``stop_gradient`` 属性默认值为 ``False``。
-该属性用于自动剪枝，避免不必要的反向运算。
-例如：
-```python
-import paddle.fluid as fluid
-import numpy as np
-with fluid.dygraph.guard():
-    x = fluid.dygraph.to_variable(np.random.randn(5, 5))  # 默认stop_gradient=True
-    y = fluid.dygraph.to_variable(np.random.randn(5, 5))  # 默认stop_gradient=True
-    z = fluid.dygraph.to_variable(np.random.randn(5, 5))
-    z.stop_gradient = False
-    a = x + y
-    a.stop_gradient  # True
-    b = a + z
-    b.stop_gradient  # False
-```
-当你想冻结你的模型的一部分，或者你事先知道你不会使用某些参数的梯度的时候，这个功能是非常有用的。
-例如：
-```python
-import paddle.fluid as fluid
-import numpy as np
-with fluid.dygraph.guard():
-    value0 = np.arange(26).reshape(2, 13).astype("float32")
-    value1 = np.arange(6).reshape(2, 3).astype("float32")
-    value2 = np.arange(10).reshape(2, 5).astype("float32")
-    fc = fluid.Linear(13, 5, dtype="float32")
-    fc2 = fluid.Linear(3, 3, dtype="float32")
-    a = fluid.dygraph.to_variable(value0)
-    b = fluid.dygraph.to_variable(value1)
-    c = fluid.dygraph.to_variable(value2)
-    out1 = fc(a)
-    out2 = fc2(b)
-    out1.stop_gradient = True  # 将不会对out1这部分子图做反向计算
-    out = fluid.layers.concat(input=[out1, out2, c], axis=1)
-    out.backward()
-    # 可以发现这里fc参数的梯度都为0
-    assert (fc.weight.gradient() == 0).all()
-    assert (out1.gradient() == 0).all()
-```
-## 使用DyGraph训练模型
-接下来我们将以“手写数字识别”这个最基础的模型为例，展示如何利用DyGraph模式搭建并训练一个模型：
-有关手写数字识别的相关理论知识请参考[PaddleBook](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)中的内容，我们在这里默认您已经了解了该模型所需的深度学习理论知识。
-1. 准备数据，我们使用`paddle.dataset.mnist`作为训练所需要的数据集：
-    ```python
-    train_reader = paddle.batch(
-    paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
-    ```
-2. 构建网络，虽然您可以根据之前的介绍自己定义所有的网络结构，但是您也可以直接使用`fluid.dygraph.Layer`当中我们为您定制好的一些基础网络结构，这里我们利用`fluid.dygraph.Conv2D`以及`fluid.dygraph.Pool2d`构建了基础的`SimpleImgConvPool`：
-    ```python
-    class SimpleImgConvPool(fluid.dygraph.Layer):
-        def __init__(self,
-                     num_channels,
-                     num_filters,
-                     filter_size,
-                     pool_size,
-                     pool_stride,
-                     pool_padding=0,
-                     pool_type='max',
-                     global_pooling=False,
-                     conv_stride=1,
-                     conv_padding=0,
-                     conv_dilation=1,
-                     conv_groups=1,
-                     act=None,
-                     use_cudnn=False,
-                     param_attr=None,
-                     bias_attr=None):
-            super(SimpleImgConvPool, self).__init__()
-            self._conv2d = fluid.dygraph.Conv2D(
-                num_channels=num_channels,
-                num_filters=num_filters,
-                filter_size=filter_size,
-                stride=conv_stride,
-                padding=conv_padding,
-                dilation=conv_dilation,
-                groups=conv_groups,
-                param_attr=param_attr,
-                bias_attr=bias_attr,
-                act=act,
-                use_cudnn=use_cudnn)
-            self._pool2d = fluid.dygraph.Pool2D(
-                pool_size=pool_size,
-                pool_type=pool_type,
-                pool_stride=pool_stride,
-                pool_padding=pool_padding,
-                global_pooling=global_pooling,
-                use_cudnn=use_cudnn)
-        def forward(self, inputs):
-            x = self._conv2d(inputs)
-            x = self._pool2d(x)
-            return x
-    ```
-    > 注意: 构建网络时子网络的定义和使用请在`__init__`中进行， 而子网络的执行则在`forward`函数中进行
-3. 利用已经构建好的`SimpleImgConvPool`组成最终的`MNIST`网络：
-    ```python
-    class MNIST(fluid.dygraph.Layer):
-        def __init__(self):
-            super(MNIST, self).__init__()
-            self._simple_img_conv_pool_1 = SimpleImgConvPool(
-                1, 20, 5, 2, 2, act="relu")
-            self._simple_img_conv_pool_2 = SimpleImgConvPool(
-                20, 50, 5, 2, 2, act="relu")
-            self.pool_2_shape = 50 * 4 * 4
-            SIZE = 10
-            scale = (2.0 / (self.pool_2_shape**2 * SIZE))**0.5
-            self._fc = fluid.dygraph.Linear(
-                        self.pool_2_shape,
-                        10,
-                        param_attr=fluid.param_attr.ParamAttr(
-                            initializer=fluid.initializer.NormalInitializer(
-                                loc=0.0, scale=scale)),
-                        act="softmax")
-        def forward(self, inputs, label=None):
-            x = self._simple_img_conv_pool_1(inputs)
-            x = self._simple_img_conv_pool_2(x)
-            x = fluid.layers.reshape(x, shape=[-1, self.pool_2_shape])
-            x = self._fc(x)
-            if label is not None:
-                acc = fluid.layers.accuracy(input=x, label=label)
-                return x, acc
-            else:
-                return x
-   ```
-4. 在`fluid.dygraph.guard()`中定义配置好的`MNIST`网络结构，此时即使没有训练也可以在`fluid.dygraph.guard()`中调用模型并且检查输出：
-    ```python
-    with fluid.dygraph.guard():
-        mnist = MNIST()
-        train_reader = paddle.batch(
-                paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
-        id, data = list(enumerate(train_reader()))[0]
-        dy_x_data = np.array(
-            [x[0].reshape(1, 28, 28)
-             for x in data]).astype('float32')
-        img = fluid.dygraph.to_variable(dy_x_data)
-        print("result is: {}".format(mnist(img).numpy()))
-   ```
-   输出：
-   ```
-   result is: [[0.10135901 0.1051138  0.1027941  ... 0.0972859  0.10221873 0.10165327]
-           [0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
-           [0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991  ]
-           ...
-           [0.10120598 0.0996111  0.10512722 ... 0.10067689 0.10088114 0.10071224]
-           [0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483  ]
-           [0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]]
-   ```
-5. 构建训练循环，在每一轮参数更新完成后我们调用`mnist.clear_gradients()`来重置梯度：
-    ```python
-    with fluid.dygraph.guard():
-        epoch_num = 5
-        BATCH_SIZE = 64
-        train_reader = paddle.batch(
-            paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
-        mnist = MNIST()
-        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
-        for epoch in range(epoch_num):
-            for batch_id, data in enumerate(train_reader()):
-                dy_x_data = np.array([x[0].reshape(1, 28, 28)
-                                      for x in data]).astype('float32')
-                y_data = np.array(
-                    [x[1] for x in data]).astype('int64').reshape(-1, 1)
-                img = fluid.dygraph.to_variable(dy_x_data)
-                label = fluid.dygraph.to_variable(y_data)
-                cost = mnist(img)
-                loss = fluid.layers.cross_entropy(cost, label)
-                avg_loss = fluid.layers.mean(loss)
-                if batch_id % 100 == 0 and batch_id is not 0:
-                    print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
-                avg_loss.backward()
-                adam.minimize(avg_loss)
-                mnist.clear_gradients()
-    ```
-6. 变量及优化器
-    模型的参数或者任何您希望检测的值可以作为变量封装在类中，然后通过对象获取并使用`numpy()`方法获取其`ndarray`的输出， 在训练过程中您可以使用`mnist.parameters()`来获取到网络中所有的参数，也可以指定某一个`Layer`的某个参数或者`parameters()`来获取该层的所有参数，使用`numpy()`方法随时查看参数的值
-    反向运行后调用之前定义的`Adam`优化器对象的`minimize`方法进行参数更新:
-    ```python
-    with fluid.dygraph.guard():
-        epoch_num = 5
-        BATCH_SIZE = 64
-        mnist = MNIST()
-        adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
-        train_reader = paddle.batch(
-            paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
-        np.set_printoptions(precision=3, suppress=True)
-        for epoch in range(epoch_num):
-            for batch_id, data in enumerate(train_reader()):
-                dy_x_data = np.array(
-                    [x[0].reshape(1, 28, 28)
-                     for x in data]).astype('float32')
-                y_data = np.array(
-                    [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
-                img = fluid.dygraph.to_variable(dy_x_data)
-                label = fluid.dygraph.to_variable(y_data)
-                label.stop_gradient = True
-                cost = mnist(img)
-                loss = fluid.layers.cross_entropy(cost, label)
-                avg_loss = fluid.layers.mean(loss)
-                dy_out = avg_loss.numpy()
-                avg_loss.backward()
-                adam.minimize(avg_loss)
-                mnist.clear_gradients()
-                dy_param_value = {}
-                for param in mnist.parameters():
-                    dy_param_value[param.name] = param.numpy()
-                if batch_id % 20 == 0:
-                    print("Loss at step {}: {}".format(batch_id, avg_loss.numpy()))
-        print("Final loss: {}".format(avg_loss.numpy()))
-        print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean()))
-        print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean()))
-    ```
-    输出：
-        ```
-        Loss at step 0: [2.302]
-        Loss at step 20: [1.616]
-        Loss at step 40: [1.244]
-        Loss at step 60: [1.142]
-        Loss at step 80: [0.911]
-        Loss at step 100: [0.824]
-        Loss at step 120: [0.774]
-        Loss at step 140: [0.626]
-        Loss at step 160: [0.609]
-        Loss at step 180: [0.627]
-        Loss at step 200: [0.466]
-        Loss at step 220: [0.499]
-        Loss at step 240: [0.614]
-        Loss at step 260: [0.585]
-        Loss at step 280: [0.503]
-        Loss at step 300: [0.423]
-        Loss at step 320: [0.509]
-        Loss at step 340: [0.348]
-        Loss at step 360: [0.452]
-        Loss at step 380: [0.397]
-        Loss at step 400: [0.54]
-        Loss at step 420: [0.341]
-        Loss at step 440: [0.337]
-        Loss at step 460: [0.155]
-        Final loss: [0.164]
-        _simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714
-        _simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05
-        ```
-7.    性能
-在使用`fluid.dygraph.guard()`时可以通过传入`fluid.CUDAPlace(0)`或者`fluid.CPUPlace()`来选择执行DyGraph的设备，通常如果不做任何处理将会自动适配您的设备。
-## 使用多卡训练模型
-目前PaddlePaddle支持通过多进程方式进行多卡训练，即每个进程对应一张卡。训练过程中，在第一次执行前向操作时，如果该操作需要参数，则会将0号卡的参数Broadcast到其他卡上，确保各个卡上的参数一致；在计算完反向操作之后，将产生的参数梯度在所有卡之间进行聚合；最后在各个GPU卡上分别进行参数更新。
-```python
-place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
-with fluid.dygraph.guard(place):
-    strategy = fluid.dygraph.parallel.prepare_context()
-    epoch_num = 5
-    BATCH_SIZE = 64
-    mnist = MNIST()
-    adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
-    mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
-    train_reader = paddle.batch(
-        paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
-    train_reader = fluid.contrib.reader.distributed_batch_reader(
-        train_reader)
-    for epoch in range(epoch_num):
-        for batch_id, data in enumerate(train_reader()):
-            dy_x_data = np.array([x[0].reshape(1, 28, 28)
-                                  for x in data]).astype('float32')
-            y_data = np.array(
-                [x[1] for x in data]).astype('int64').reshape(-1, 1)
-            img = fluid.dygraph.to_variable(dy_x_data)
-            label = fluid.dygraph.to_variable(y_data)
-            label.stop_gradient = True
-            cost, acc = mnist(img, label)
-            loss = fluid.layers.cross_entropy(cost, label)
-            avg_loss = fluid.layers.mean(loss)
-            avg_loss = mnist.scale_loss(avg_loss)
-            avg_loss.backward()
-            mnist.apply_collective_grads()
-            adam.minimize(avg_loss)
-            mnist.clear_gradients()
-            if batch_id % 100 == 0 and batch_id is not 0:
-                print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
-```
-命令式编程模式单卡训练转多卡训练需要修改的地方主要有四处：
-1. 需要从环境变量获取设备的ID，即：
-    ```python
-    place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
-    ```
-2. 需要对原模型做一些预处理，即：
-    ```python
-    strategy = fluid.dygraph.parallel.prepare_context()
-    mnist = MNIST()
-    adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
-    mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
-    ```
-3. 数据读取，必须确保每个进程读取的数据是不同的，即所有进程读取数据的交集为空，所有进程读取数据的并集是完整的数据集：
-    ```python
-    train_reader = paddle.batch(
-        paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
-    train_reader = fluid.contrib.reader.distributed_batch_reader(
-        train_reader)
-    ```
-4. 需要对loss进行调整，以及对参数的梯度进行聚合，即：
-    ```python
-    avg_loss = mnist.scale_loss(avg_loss)
-    avg_loss.backward()
-    mnist.apply_collective_grads()
-    ```
-Paddle命令式编程模式多进程多卡模型训练启动时需要指定使用的GPU，即如果使用`0,1,2,3`卡，启动方式如下：
-```
-python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py
-```
-输出结果为：
-```
-----------  Configuration Arguments -----------
-cluster_node_ips: 127.0.0.1
-log_dir: ./mylog
-node_ip: 127.0.0.1
-print_config: True
-selected_gpus: 0,1,2,3
-started_port: 6170
-training_script: train.py
-training_script_args: ['--use_data_parallel', '1']
-use_paddlecloud: True
------------------------------------------------
-trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
-```
-此时，程序会将每个进程的输出log导出到./mylog路径下：
-```
-.
-├── mylog
-│   ├── workerlog.0
-│   ├── workerlog.1
-│   ├── workerlog.2
-│   └── workerlog.3
-└── train.py
-```
-如果不指定`--log_dir`，程序会将打印出所有进程的输出，即：
-```
-----------  Configuration Arguments -----------
-cluster_node_ips: 127.0.0.1
-log_dir: None
-node_ip: 127.0.0.1
-print_config: True
-selected_gpus: 0,1,2,3
-started_port: 6170
-training_script: train.py
-training_script_args: ['--use_data_parallel', '1']
-use_paddlecloud: True
------------------------------------------------
-trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
-grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
-grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
-grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
-grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
-I0923 09:32:36.423513 56410 nccl_context.cc:120] init nccl context nranks: 4 local rank: 1 gpu id: 1
-I0923 09:32:36.425287 56411 nccl_context.cc:120] init nccl context nranks: 4 local rank: 2 gpu id: 2
-I0923 09:32:36.429337 56409 nccl_context.cc:120] init nccl context nranks: 4 local rank: 0 gpu id: 0
-I0923 09:32:36.429440 56412 nccl_context.cc:120] init nccl context nranks: 4 local rank: 3 gpu id: 3
-W0923 09:32:42.594097 56412 device_context.cc:198] Please NOTE: device: 3, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
-W0923 09:32:42.605836 56412 device_context.cc:206] device: 3, cuDNN Version: 7.5.
-W0923 09:32:42.632463 56410 device_context.cc:198] Please NOTE: device: 1, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
-W0923 09:32:42.637948 56410 device_context.cc:206] device: 1, cuDNN Version: 7.5.
-W0923 09:32:42.648674 56411 device_context.cc:198] Please NOTE: device: 2, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
-W0923 09:32:42.654021 56411 device_context.cc:206] device: 2, cuDNN Version: 7.5.
-W0923 09:32:43.048696 56409 device_context.cc:198] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
-W0923 09:32:43.053236 56409 device_context.cc:206] device: 0, cuDNN Version: 7.5.
-start data reader (trainers_num: 4, trainer_id: 2)
-start data reader (trainers_num: 4, trainer_id: 3)
-start data reader (trainers_num: 4, trainer_id: 1)
-start data reader (trainers_num: 4, trainer_id: 0)
-Loss at epoch 0 step 0: [0.57390565]
-Loss at epoch 0 step 0: [0.57523954]
-Loss at epoch 0 step 0: [0.575606]
-Loss at epoch 0 step 0: [0.5767452]
-```
-## 模型参数的保存
-命令式编程模式由于模型和优化器在不同的对象中存储，模型参数和优化器信息要分别存储。
- 在模型训练中可以使用 `paddle.fluid.dygraph.save_dygraph(state_dict, model_path)` 来保存模型参数的dict或优化器信息的dict。
-同样可以使用 `paddle.fluid.dygraph.load_dygraph(model_path)` 获取保存的模型参数的dict和优化器信息的dict。
-再使用`your_modle_object.set_dict(para_dict)`接口来恢复保存的模型参数从而达到继续训练的目的。
-以及使用`your_optimizer_object.set_dict(opti_dict)`接口来恢复保存的优化器中的`learning rate decay`值。
-下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。
-```python
-import paddle.fluid as fluid
-with fluid.dygraph.guard():
-    epoch_num = 5
-    BATCH_SIZE = 64
-    mnist = MNIST()
-    adam = fluid.optimizer.Adam(learning_rate=0.001, parameter_list=mnist.parameters())
-    train_reader = paddle.batch(
-        paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
-    np.set_printoptions(precision=3, suppress=True)
-    dy_param_init_value={}
-    for epoch in range(epoch_num):
-        for batch_id, data in enumerate(train_reader()):
-            dy_x_data = np.array(
-                [x[0].reshape(1, 28, 28)
-                 for x in data]).astype('float32')
-            y_data = np.array(
-                [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
-            img = fluid.dygraph.to_variable(dy_x_data)
-            label = fluid.dygraph.to_variable(y_data)
-            label.stop_gradient = True
-            cost = mnist(img)
-            loss = fluid.layers.cross_entropy(cost, label)
-            avg_loss = fluid.layers.mean(loss)
-            dy_out = avg_loss.numpy()
-            avg_loss.backward()
-            adam.minimize(avg_loss)
-            if batch_id == 20:
-                fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
-            mnist.clear_gradients()
-            if batch_id == 20:
-                for param in mnist.parameters():
-                    dy_param_init_value[param.name] = param.numpy()
-                model, _ = fluid.dygraph.load_dygraph("paddle_dy")
-                mnist.set_dict(model)
-                break
-        if epoch == 0:
-            break
-    restore = mnist.parameters()
-    # check save and load
-    success = True
-    for value in restore:
-        if (not np.array_equal(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())):
-            success = False
-    print("model save and load success? {}".format(success))
-```
-需要注意的是，如果采用多卡训练，只需要一个进程对模型参数进行保存，因此在保存模型参数时，需要进行指定保存哪个进程的参数，比如
-```python
-    if fluid.dygraph.parallel.Env().local_rank == 0:
-        fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
-```
-## 模型评估
-当我们需要在DyGraph模式下利用搭建的模型进行预测任务，请在`fluid.dygraph.guard()`上下文中调用一次`YourModel.eval()`接口来切换到预测模式。例如，在之前的手写数字识别模型中我们可以使用`mnist.eval()`来切换到预测模式。需要显示地调用`YourModel.eval()`切换到预测模式的原因是，我们默认在`fluid.dygraph.guard()`上下文中是训练模式，训练模式下DyGraph在运行前向网络的时候会自动求导，添加反向网络；而在预测时，DyGraph只需要执行前向的预测网络，不需要进行自动求导并执行反向网络。
-**请注意，如果您在`GPU`设备中运行`YourModel`模型，并且未调用`loss.backward`（通常来说，是进行预测时），则必须调用`YourModel.eval()`，以避免构建反向网络，否则有可能会导致显存不足。**
-下面的代码展示了如何使用DyGraph模式训练一个用于执行“手写数字识别”任务的模型并保存，并且利用已经保存好的模型进行预测。
-我们在`fluid.dygraph.guard()`上下文中进行了模型的保存和训练，值得注意的是，当我们需要在训练的过程中进行预测时需要使用`YourModel.eval()`切换到预测模式，并且在预测完成后使用`YourModel.train()`切换回训练模式继续训练。
-我们在`inference_mnist `中启用另一个`fluid.dygraph.guard()`，并在其上下文中`load`之前保存的`checkpoint`进行预测，同样的在执行预测前需要使用`YourModel.eval()`来切换到预测模式。
-```python
-def test_mnist(reader, model, batch_size):
-    acc_set = []
-    avg_loss_set = []
-    for batch_id, data in enumerate(reader()):
-        dy_x_data = np.array([x[0].reshape(1, 28, 28)
-                              for x in data]).astype('float32')
-        y_data = np.array(
-            [x[1] for x in data]).astype('int64').reshape(batch_size, 1)
-        img = fluid.dygraph.to_variable(dy_x_data)
-        label = fluid.dygraph.to_variable(y_data)
-        label.stop_gradient = True
-        prediction, acc = model(img, label)
-        loss = fluid.layers.cross_entropy(input=prediction, label=label)
-        avg_loss = fluid.layers.mean(loss)
-        acc_set.append(float(acc.numpy()))
-        avg_loss_set.append(float(avg_loss.numpy()))
-        # get test acc and loss
-    acc_val_mean = np.array(acc_set).mean()
-    avg_loss_val_mean = np.array(avg_loss_set).mean()
-    return avg_loss_val_mean, acc_val_mean
-def inference_mnist():
-    with fluid.dygraph.guard():
-        mnist_infer = MNIST()
-        # load checkpoint
-        model_dict, _ = fluid.dygraph.load_dygraph("paddle_dy")
-        mnist_infer.load_dict(model_dict)
-        print("checkpoint loaded")
-        # start evaluate mode
-        mnist_infer.eval()
-        def load_image(file):
-            im = Image.open(file).convert('L')
-            im = im.resize((28, 28), Image.ANTIALIAS)
-            im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)
-            im = im / 255.0 * 2.0 - 1.0
-            return im
-        cur_dir = os.path.dirname(os.path.realpath(__file__))
-        tensor_img = load_image(cur_dir + '/image/infer_3.png')
-        results = mnist_infer(fluid.dygraph.to_variable(tensor_img))
-        lab = np.argsort(results.numpy())
-        print("Inference result of image/infer_3.png is: %d" % lab[0][-1])
-with fluid.dygraph.guard():
-    epoch_num = 1
-    BATCH_SIZE = 64
-    mnist = MNIST()
-    adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
-    test_reader = paddle.batch(
-        paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True)
-    train_reader = paddle.batch(
-        paddle.dataset.mnist.train(),
-        batch_size=BATCH_SIZE,
-        drop_last=True)
-    for epoch in range(epoch_num):
-        for batch_id, data in enumerate(train_reader()):
-            dy_x_data = np.array([x[0].reshape(1, 28, 28)
-                                  for x in data]).astype('float32')
-            y_data = np.array(
-                [x[1] for x in data]).astype('int64').reshape(-1, 1)
-            img = fluid.dygraph.to_variable(dy_x_data)
-            label = fluid.dygraph.to_variable(y_data)
-            label.stop_gradient = True
-            cost, acc = mnist(img, label)
-            loss = fluid.layers.cross_entropy(cost, label)
-            avg_loss = fluid.layers.mean(loss)
-            avg_loss.backward()
-            adam.minimize(avg_loss)
-            # save checkpoint
-            mnist.clear_gradients()
-            if batch_id % 100 == 0:
-                print("Loss at epoch {} step {}: {:}".format(
-                    epoch, batch_id, avg_loss.numpy()))
-        mnist.eval()
-        test_cost, test_acc = test_mnist(test_reader, mnist, BATCH_SIZE)
-        mnist.train()
-        print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format(
-            epoch, test_cost, test_acc))
-    fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
-    print("checkpoint saved")
-    inference_mnist()
-```
-输出：
-```
-Loss at epoch 0 step 0: [2.2991252]
-Loss at epoch 0 step 100: [0.15491392]
-Loss at epoch 0 step 200: [0.13315125]
-Loss at epoch 0 step 300: [0.10253005]
-Loss at epoch 0 step 400: [0.04266362]
-Loss at epoch 0 step 500: [0.08894891]
-Loss at epoch 0 step 600: [0.08999012]
-Loss at epoch 0 step 700: [0.12975612]
-Loss at epoch 0 step 800: [0.15257305]
-Loss at epoch 0 step 900: [0.07429226]
-Loss at epoch 0 , Test avg_loss is: 0.05995981965082674, acc is: 0.9794671474358975
-checkpoint saved
-No optimizer loaded. If you didn't save optimizer, please ignore this. The program can still work with new optimizer.
-checkpoint loaded
-Inference result of image/infer_3.png is: 3
-```
-## 编写兼容的模型
-以上一步中手写数字识别的例子为例，命令式编程模式的模型代码可以直接用于声明式编程模式中作为模型代码，执行时，直接使用PaddlePaddle声明式编程模式执行方式即可，这里以声明式编程模式中的`executor`为例, 模型代码可以直接使用之前的模型代码，执行时使用`Executor`执行即可
-```python
-epoch_num = 1
-BATCH_SIZE = 64
-exe = fluid.Executor(fluid.CPUPlace())
-mnist = MNIST()
-sgd = fluid.optimizer.SGDOptimizer(learning_rate=1e-3, parameter_list=mnist.parameters())
-train_reader = paddle.batch(
-    paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
-img = fluid.layers.data(
-    name='pixel', shape=[1, 28, 28], dtype='float32')
-label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-cost = mnist(img)
-loss = fluid.layers.cross_entropy(cost, label)
-avg_loss = fluid.layers.mean(loss)
-sgd.minimize(avg_loss)
-out = exe.run(fluid.default_startup_program())
-for epoch in range(epoch_num):
-    for batch_id, data in enumerate(train_reader()):
-        static_x_data = np.array(
-            [x[0].reshape(1, 28, 28)
-             for x in data]).astype('float32')
-        y_data = np.array(
-            [x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1])
-        fetch_list = [avg_loss.name]
-        out = exe.run(
-            fluid.default_main_program(),
-            feed={"pixel": static_x_data,
-                  "label": y_data},
-            fetch_list=fetch_list)
-        static_out = out[0]
-        if batch_id % 100 == 0 and batch_id is not 0:
-            print("epoch: {}, batch_id: {}, loss: {}".format(epoch, batch_id, static_out))
-```
--- a/doc/fluid/beginners_guide/basic_concept/executor.rst
+++ b/doc/fluid/beginners_guide/basic_concept/executor.rst
-.. _cn_user_guide_Executor:
-=======
-Executor
-=======
-飞桨（PaddlePaddle，以下简称Paddle）的设计思想类似于高级编程语言C++和JAVA等。程序的执行过程被分为编译和执行两个阶段。
-用户完成对 Program 的定义后，Executor 接受这段 Program 并转化为C++后端真正可执行的 FluidProgram，这一自动完成的过程叫做编译。
-编译过后需要 Executor 来执行这段编译好的 FluidProgram。
-例如上文实现的加法运算，当构建好 Program 后，需要创建 Executor，执行startup Program 和训练 Program：
-.. code-block:: python
-    import paddle.fluid as fluid
-    import numpy
-    a = fluid.data(name="a",shape=[1],dtype='float32')
-    b = fluid.data(name="b",shape=[1],dtype='float32')
-    result = fluid.layers.elementwise_add(a,b)
-    # 定义执行器，并且制定执行的设备为CPU
-    cpu = fluid.core.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    x = numpy.array([5]).astype("float32")
-    y = numpy.array([7]).astype("float32")
-    outs = exe.run(
-            feed={'a':x,'b':y},
-            fetch_list=[result])
-    # 打印输出结果，[array([12.], dtype=float32)]
-    print( outs )
--- a/doc/fluid/beginners_guide/basic_concept/fluid_basic_concept.rst
+++ b/doc/fluid/beginners_guide/basic_concept/fluid_basic_concept.rst
-================================
-PaddleFluid设计思想和基本使用概念
-================================
-Paddle Fluid 是用来让用户像 PyTorch 和 Tensorflow Eager Execution 一样执行程序。
-在这些系统中，不再有模型这个概念，应用也不再包含一个用于描述 Operator 图或者一系列层的符号描述，
-而是像通用程序那样描述训练或者预测的过程。
-深度学习平台的演化
-================
-时至今日，深度学习已成为事实上最流行的机器学习技术。学术界多年研究加上工业界的长期实践提出了若干有效的基本建模单元：
-全连接，卷积，循环神经网络等；设计各类训练技巧：初始化方法，跨层连接，各类 norm 技术等；
-发明了各种新的优化算法：Adadelta，Adam 等；
-各类固定的网络结构：highway, residual, attention 等纷纷涌现，不胜枚举。
-学术界工业界多年的付出共同促成了深度学习方法今日的影响力。
-学术研究和生产实践中积累了大量的知识，能够很好的解释神经网络中基本模块各自独的学习能力和特性。
-基本模块和训练技术的组合能够搭建出千变万化的神经网络模型。
-基本模块和训练技术是有限的，但他们的组合却是千变万化，这是深度学习方法的魅力所在，也是难度所在。
-正是这样高度的模块化特性，研究者和工程师们都在努力避免重复造轮子以提高研究和生产的效率，
-又进一步催生了深度学习平台技术的发展，深度学习框架已演变成为 AI 基础设施中重要的一部分。
-从 Theano，到 DistBelief，到 TensorFlow；从 Caffe 到 Caffe2；
-从 Torch 到 PyTorch；从 PaddlePaddle 到 PaddleFluid，
-深度学习平台技术也经历了两代的演化，并向着第三代平台技术迈进。
-站在历史发展的今天，当我们准备切换尝试使用一个新的深度学习平台作为支持自己学习和研究的工具时，
-平台技术都发生了哪些演化，能够为我们的带来什么便利呢？
-先让我们来看看深度学习框架解决的三大问题：
- 如何描述计算以支持未来潜在会出现的新模型？
- 如何高效利用异构设备最大化算力？
- 如何利用网络中的计算机进行分布式计算来处理千万亿级别的数据？
-以上三个问题中的第一个和使用者研究者最为密切相关。
-这篇文章我们通过分析 PaddleFluid的设计理念，
-来了解一个深度学习框架如何抽象深度学习模型，来看看我们的使用经验如何在不同深度学习平台之间过度和迁移。
-如何描述计算
-=============
-让我们首先来看看 PaddleFluid 如何描述机器学习模型
-PaddleFluid之 :code:`Program`
-如何描述计算很大程度决定了一个神经网络框架计算功能的完备性。
-深度学习模型和方法历经二十多年的发展：“依次执行一组计算的前向，
-再以和前向计算相反的顺序执行反向计算，中间无分支无交互”，
-这样的模型结构已经无法满足研究者和千千万万框架使用者的想象力。
-从 `PaddleFluid 的设计目标 <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/motivation/fluid.md>`_ 来看，
-在如何描述机器学习模型这一核心问题上，PaddleFluid 的目标是：
-创造一种新的计算描述方式，不但能够描述至今为止人们已知的主流神经网络模型，并且能够支持未来会出现的任意模型。
-PaddleFluid 是如何做到支持未来出现的新模型这一目标呢？PaddleFluid 的设计选择是：
-对用户来说，用一段 :code:`Program` （在 PaddleFluid 内部会被转化为一种叫作 :code:`ProgramDesc` 的描述语言），
-而不是用计算图来描述机器学习模型。 :code:`Program` 用符合用户使用直觉的方式，
-提供一种新的描述语言能够描述任意复杂的机器学习模型。
-对所有计算机专业同学学习编程语言的第一课一定是建立对“程序语言的三种执行结构：顺序执行，条件选择和循环执行”的认识。
-计算机世界的所有可计算逻辑都是由这三种执行结构表示，用这三种结构描述的逻辑是可计算的。那么同样道理，
-对一个神经网络框架来说，如果可以和程序语言一样提供对这三种执行结构的支持，那么将可以描述任意复杂的，
-可被计算机计算的机器学习模型。PaddleFluid通过提供对这三种执行结构的支持，来做到对任意复杂模型的描述。
-具体来说：
-1. Fluid 的核心设计理念都可以类比到程序语言，如果已经有写程序的经验，那么使用 Fluid 构建神经网络模型的体验，将非常接近写程序；
-2. 在 PaddleFluid 中，用户不会显示地感知“计算图”这样的概念，一个机器学习模型被描述为一个 Fluid :code:`Program` （Fluid 内部称之为 :code:`ProgramDesc` ）；
- 一个 Fluid :code:`Program` 由一组嵌套的 :code:`Block` 构成。 :code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号，或是 Python 语言中的一个缩进快；
-  :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合，构成复杂的计算逻辑。
-3. Fluid :code:`Program` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor。
-在描述计算和计算的作用对象这一问题上，各个深度学习框架的选择是相同的，如果有一个平台的使用经验，那么将非常容易在各个平台之间进行迁移。
-核心使用概念
-=============
-下面，我们将更详细地了解核心使用概念在PaddlePaddle的使用方法。
-数据表示和计算的对象：Tensor
--------------------------
-Tensor 是向量矩阵概念的扩展，是神经网络模型计算操作的基本对象。这在是今天所有主流深度学习平台的共同选择。
-可以简单地将 Tensor 理解为一个 N 维向量，它可以有任意多的维度。一个 Tensor 具有两个基本特征：
-1. 数据类型：每个 Tensor 的所有元素具有同样的、已知的数据类型；
-2. 大小（或者说形状）：即维度的个数（rank，阶）以及各维度的长度。
-Tensor 某些维度的长度在定义模型阶段可能是未知的，在实际算法执行时才能确定。例如一个 mini-batch 中包含的样本数目（batch size），或者是一个 mini-batch 中序列的最大长度。
-PaddleFluid中的Tensor
-""""""""""""""""""""""
-PaddleFluid 中也使用 Tensor 作为神经网络中输入输出数据的统一表示。Tensor 的概念在今天主流的深度学习平台中都是完全相同，可以在各个深度学习框架之间直接无缝迁移。
-在 Fluid 中也同样存在三种特殊的 Tensor：
-1. 模型中的可学习参数
-模型中的可学习参数生存期和整个训练任务一样长，会接受优化算法的更新。在 PaddleFluid 中同样以 :code:`Variable` 表示；
-用户在绝大多数情况下都不需要自己来创建网络中的可学习参数，Fluid 为几乎常见的神经网络基本计算模块都提供了封装。
-以最简单的全连接模型为例，下面的代码片段会直接为全连接层创建连接权值 WW 和偏置（ :code:`bias` ）两个可学习参数，
-无需显示地调用 variable 相关接口创建可学习参数。
-::
-  import paddle.fluid as fluid
-  y = fluid.layers.fc(input=x, size=128, bias_attr=True)
-2. 输入输出Tensor
-整个神经网络的输入数据也是一个特殊的 Tensor，在这个 Tensor 中，
-一些维度的大小在定义模型时无法确定（通常包括：batch size；
-如果 mini-batch 之间，数据可变，也会包括序列的最大长度，图片的宽度和高度等），在定义模型时需要占位；
-PaddleFluid 中使用 :code:`fluid.layers.data` 来接入输入数据， :code:`fluid.layer.data` 需要提供输入 Tensor 的 形状信息，
-当遇到无法确定的维度 时， 相应维度指定为 None ，如下面的代码片段所示：
-::
-  import paddle.fluid as fluid
-  x = fluid.layers.data(name="x", shape=[2, None, 3], dtype="int64")
-3. 常量 Tensor 在 PaddleFluid 中需要通过组合 Tensor 和 :code:`fluid.layers.assign` 来实现。
-计算原语：Operation/Operator
----------------------------
-Tensor 是今天所有主流深度学习框架的统一数据表示（输入、输出、中间计算结果、模型的可学习参数都是 Tensor）。
-另一方面，对数据的操作，在主流深度学习框架中也高度统一为：Operator/Operation。
-在中文中，通常我们会习惯将其称之为算子。
-注：在 PaddleFluid 中使用 Operator 称呼对 Tensor 的操作。
-Operation/Operator 接受多个 Tensor 作为输入，输出若干个 Tensor，表示了从输入到输出的变化。
-PaddleFluid中的Operator
-""""""""""""""""""""""""
-PaddleFluid 支持的所有算子，可以在 `API 帮助文档 <http://www.paddlepaddle.org/docs/develop/api/en/fluid/layers.html>`_ 中查看。
-为了便于用户使用，在 Python 端，Fluid 中的 Operator 被进一步封装入 :code:`paddle.fluid.layers` ，
-:code:`paddle.fluid.networks` 等模块。这是因为：一些常见的对Tensor的操作可能是有更多基础操作构成，
-例如：l2 norm 内部由 reduce、elementwise_add，scale 等多个 Operator 组合计算逻辑完成，
-为了提高使用的便利性，框架内部对基础 Operator 进行了一些封装，包括创建 Operator 依赖可学习参数，
-可学习参数的初始化细节等，减少用户重复开发的成本。
-对所有深度学习框架都面临同样的封装，在绝大多数情况下，用户很少会直接与框架底层的 Operator 直接打交道，而是使用框架提供的 layers，networks 等模块，降低开发的代码量。不论是什么样的概念，他们在各框架之间的本质和作用都是相同的：对 Tensor 的变换。
-总结
->>>>>>
-不论叫作 Operation、Operator 还是 layers，他们在各深度学习平台中的含义和作用都是相同的：对 Tensor 的变换。是一个深度学习平台提供的基础计算能力。可以在每个平台各自的 API 帮助文档中查到。
-在各个深度学习平台都已加入 ONNX 项目的今天，每个深度学习平台提供给大家的基本算子都已趋同，与此同时，每个平台也各有其特点，会提供一些独特的算子，方便某一类任务的开发。
-构建模型并执行
--------------
-整个训练任务运行方法如下：
-Fluid中的Program和Executor
-"""""""""""""""""""""""""""
-1. Fluid 使用 :code:`Program` 描述神经网络模型，对用户来说，并没有计算图的概念。
-用户定义的所有 Tensor 以及对 Tensor 的操作：Operator 都会被加入一段 :code:`Program` 中；
-一段 Program 由嵌套的 :code:`Block` 构成，但用户无需显示地创建 :code:`Block` 或是显示地注意到 :code:`Block` 的存在；
-在 Fluid 程序中， :code:`Block` 是在调用 :code:`while_op` ， :code:`if_op` ， :code:`parallel_do` 等特殊 :code:`Operator` 时，由这些 :code:`Operator` 来创建；
-对用户使用来说，只需要知道自己正在向一段 Fluid Program 中添加变量（ :code:`Tensor` ）和操作（ :code:`Operator` ）即可。
-2. Fluid 利用 :code:`Executor` 来执行一段 Fluid :code:`Program` 。
-为进一步理解 Fluid 中 :code:`Executor` 的作用，需要先解释一下 Fluid 程序的执行流程。 下图展示单机上，Fluid 程序的执行流程：
-.. figure:: fluid_local_train.jpeg
-   :scale: 50%
-   :align: center
-   Figure.1
-   Fluid本地训练任务执行流程图
-1. Fluid 设计思想和灵感非常类似于程序设计语言，和高级编译语言 C++/Java 编写程序的过程非常类似，Fluid 程序执行分为两个重要阶段：编译时和运行时；
-2. 编译期，用户通过调用 Fluid 提供的算子，向一段 :code:`Program` 中添加变量（Tensor）以及对变量的操作（Operators 或者 Layers）。用户只需要描述核心的前向计算，不需要关心反向计算，分布式下，异构设备下如何计算；
-3. 原始的 :code:`Program` 在平台内部转换为中间描述语言： :code:`ProgramDesc` ；
-4. 编译期最重要的一个功能模块是 Transpiler。Transpiler 接受一段 :code:`ProgramDesc` ，输出一段变化后的 :code:`ProgramDesc` ，作为后端 Executor 最终需要执行的 :code:`Fluid Program` ；
-最为常用的 Transipler 包括：
-1. 内存优化 Transipler：通过对变量读写依赖关系分析，插入内存回收 Operator 以维持运行过程中较小的内存开销；
-2. 分布式环境下的 Transpiler：接受用户定义的 local Program ，生成 Parameter Client 和 Parameter Server 执行的两段 :code:`Program` 。
-3. 后端 Executor 接受 Transpiler 输出的这段 :code:`Program` ，依次执行其中的 Operator（可以类比为程序语言中的指令），在执行过程中会为 Operator 创建所需的输入输出并进行管理。
-从上面的过程中可以看到，Fluid 程序的执行过程分为：编译器的定义 :code:`Program` ，和创建 :code:`Executor` 运行 :code:`Program` 。
- :code:`Executor` 执行一段 :code:`Program` 的过程是不可交互和不可中断的。
-在 Fluid 中，可以创建多余一段 :code:`Program` 。默认情况，一个 PaddleFluid 程序中存在 2 段 Program：
-1.  :code:`fluid.framework.default_startup_program` ：其中定义了创建模型参数，输入输出，以及模型中可学习参数的初始化等各种操作；
-  :code:`default_startup_program` 可以由框架自动生成，使用时无需显示地创建；
- 如果调用修改了参数的默认初始化方式，框架会自动的将相关的修改加入 :code:`default_startup_program` 。
-2.  :code:`fluid.framework.default_main_program` ：定义了神经网络模型，前向反向计算，以及优化算法对网络中可学习参数的更新；
- 使用 Fluid 的核心就是构建起 :code:`default_main_program` 。
-3. PaddleFluid 中的 :code:`Scope` 类似于 TensorFlow 中的 collection 这一概念，但在 Fluid 中 :code:`Scope` 是框架后端概念，用户无法直接操作。因此，在使用框架时无需关心。
-总结
-"""""
-Fluid 中通过 Executor 来执行一段用户定义的 Fluid :code:`Program` 。
-1. Executor 连接了 Fluid 的前端和后端；
-2. Executor 接受用户定义的原始模型（一段 :code:`Program` ），通过调用系统中不同功能更的 :code:`Transpiler` 完成对原始 :code:`Program` 的变化，进行优化。
-完整实例：如何完成一个机器学习模型的训练
-===================================
-这一节，我们以 MNIST 手写数字识别问题 —— 机器学习任务的“Hello World”问题和数据，为例，通过一个可以运行的完整实例，来学习上文介绍的概念如何在PaddleFluid 平台使用。
-步骤1：定义数据
----------------
-PaddleFluid 中以 :code:`fluid.layers.data` 来接收输入数据。
-::
-  import numpy as np
-  import paddle 
-  import paddle.fluid as fluid
-  # define the input layers for the network.
-  x = fluid.layers.data(name="img", shape=[1, 28, 28], dtype="float32")
-  y_ = fluid.layers.data(name="label", shape=[1], dtype="int64")
-Fluid 中 Tensor 的第 0 维度固定为 batch size。在上面代码段中，图像输入 :code:`x` 的形状为：[1, 28, 28]。这三个维度的含义分别是：channel 数目，图像的高度和宽度。
-实际上 Fluid 框架内部,一幅图像输入是一个 4-D Tensor，所有 Tensor 的第 0 维固定为 batch size。框架内部会自动为batch size进行填充占位。无需对batch size指定填充占位。
-如果除去 batch size（第 0 维度）外，如果 Tensor 某一维度的大小只能在运行时确定，可以在该位置上直接指定 :code:`None` 进行占位。
-步骤2：定义模型
--------------
-通过调用 Fluid 提供的算子定义含有一个隐层的神经网络。Fluid 模型的分为模型结构和优化方法两部分。这一点与 TensorFlow 程序十分相似似，使用概念可以直接对应进行迁移。
-::
-  # define the network topology.
-  y = fluid.layers.fc(input=x, size=10, act="softmax")
-  loss = fluid.layers.cross_entropy(input=y, label=y_)
-  avg_loss = fluid.layers.mean(loss)
-  # define the optimization algorithm.
-  optimizer = fluid.optimizer.Adam(learning_rate=1e-3)
-  optimizer.minimize(avg_loss)
-Fluid 使用 Program 而不是计算图描述模型，一般情况下，用户无需关心 Program 的细节，当调用以上 layers 时，会向一个全局的 Program： :code:`fluid.framework.default_main_program` 中插入变量（Tensor）和对变量的操作（上述代码段中的 layers 和 optimzier）。
-步骤3：参数初始化
----------------
-如上文介绍，Fluid 程序中的 Executor 是连接 Fluid 前端和后端的接口。
-默认一个Fluid模型存在至少两段 Program。用于初始化网络中的可学习参数的那一段 :code:`Program` 叫作 :code:`fluid.default_startup_program()` 。
-只有执行器 executor 可以执行 Fluid Program，因此，在初始化网络中的可学习参数之前，需要首先创建一个 Fluid executor。
-::
-  # define the executor.
-  place = fluid.CPUPlace()
-  exe = fluid.Executor(place)
-  exe.run(fluid.default_startup_program())
-在以上代码段中， :code:`place` 用于告诉 executor 一段 Fluid Program 在何种设备上执行，
-常见的有 :code:`fluid.CPUPlace()` 和 :code:`fluid.CUDAPlace()` 。
-步骤4：数据输入 + 执行模型训练
----------------------------
-我们在步骤 2 中定义的神经网络模型最终被插入一段叫做 :code:`fluid.framework.default_main_program` 的 Fluid Program 中。
-网络可学习参数初始化之后，可以通过让执行器 Executor 执行这段 :code:`fluid.framework.default_main_program` 来进行训练。
-::
-  train_reader = paddle.batch(
-        paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=5000),
-        batch_size=BATCH_SIZE)
-  feeder = fluid.DataFeeder(place=place, feed_list=[x, y_])
-  for pass_id in range(100):
-    for batch_id, data in enumerate(train_reader()):
-        loss = exe.run(
-            fluid.framework.default_main_program(),
-            feed=feeder.feed(data),
-            fetch_list=[avg_loss])
-        print("Cur Cost : %f" % (np.array(loss[0])[0]))
-从上面的代码片段中可以看到，Fluid 程序的训练过程和 TensorFlow 程序的训练过程非常接近，
-都放在一个 :code:`for` 循环中，循环读取一个 mini-batch 数据，
-调用执行器执行 Fluid :code:`default_main_program` ：接收 mini-batch 输入，在其上进行前向，反向和参数更新计算。
-`注：上面程序使用了 Fluid 内置的 MNIST 数据，和我们提供给 TensorFlow 示例程序的 MNIST 数据完全一样。`
-步骤5：观察模型效果
-----------------
-以上步骤已经构成了完整的 Tensorflow 模型训练程序，每个 batch 观察一次 loss，可以直观看到模型的迭代效果：
-.. figure:: fluid_mnist.png
-   :scale: 40%
-   :align: center
-   Figure.2
-   Fluid MNIST手写数字识别任务代价下降曲线
-附：完整代码
------------
-::
-  import numpy as np
-  import paddle
-  import paddle.fluid as fluid
-  def main():
-      BATCH_SIZE = 128
-      # define the input layers for the network.
-      x = fluid.layers.data(name="img", shape=[1, 28, 28], dtype="float32")
-      y_ = fluid.layers.data(name="label", shape=[1], dtype="int64")
-      # define the network topology.
-      y = fluid.layers.fc(input=x, size=10, act="softmax")
-      loss = fluid.layers.cross_entropy(input=y, label=y_)
-      avg_loss = fluid.layers.mean(loss)
-      optimizer = fluid.optimizer.Adam(learning_rate=5e-3)
-      optimizer.minimize(avg_loss)
-      # define the executor.
-      place = fluid.CPUPlace()
-      exe = fluid.Executor(place)
-      exe.run(fluid.default_startup_program())
-      train_reader = paddle.batch(
-          paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=5000),
-          batch_size=BATCH_SIZE)
-      feeder = fluid.DataFeeder(place=place, feed_list=[x, y_])
-      for pass_id in range(100):
-          for batch_id, data in enumerate(train_reader()):
-              loss = exe.run(
-                  fluid.framework.default_main_program(),
-                  feed=feeder.feed(data),
-                  fetch_list=[avg_loss])
-              print("Cur Cost : %f" % (np.array(loss[0])[0]))
-  if __name__ == "__main__":
-      main()
--- a/doc/fluid/beginners_guide/basic_concept/fluid_local_train.jpeg
+++ b/doc/fluid/beginners_guide/basic_concept/fluid_local_train.jpeg
--- a/doc/fluid/beginners_guide/basic_concept/fluid_mnist.png
+++ b/doc/fluid/beginners_guide/basic_concept/fluid_mnist.png
--- a/doc/fluid/beginners_guide/basic_concept/index_cn.rst
+++ b/doc/fluid/beginners_guide/basic_concept/index_cn.rst
-############
-基本概念
-############
-本文介绍飞桨核心框架中的基本概念：
- `编程指南 <./programming_guide/programming_guide.html>`_ : 介绍飞桨的基本概念和使用方法。
- `Variable <variable.html>`_ : Variable表示变量，在飞桨中可以包含任何类型的值，在大多数情况下是一个Lod-Tensor。
- `Tensor <tensor.html>`_ : Tensor表示数据。
- `LoD-Tensor <lod_tensor.html>`_ : LoD-Tensor是飞桨的高级特性，它在Tensor基础上附加了序列信息，支持处理变长数据。
- `Operator <operator.html>`_ : Operator表示对数据的操作。
- `Program <program.html>`_ : Program表示对计算过程的描述。
- `Executor <executor.html>`_ : Executor表示执行引擎。
- `命令式编程模式(动态图)机制-DyGraph <./dygraph/DyGraph.html>`_ : 介绍飞桨命令式编程模式执行机制。
-..  toctree::
-    :hidden:
-    programming_guide/programming_guide.md
-    variable.rst
-    tensor.rst
-    lod_tensor.rst
-    operator.rst
-    program.rst
-    executor.rst
-    dygraph/DyGraph.md
--- a/doc/fluid/beginners_guide/basic_concept/index_en.rst
+++ b/doc/fluid/beginners_guide/basic_concept/index_en.rst
-############
-Basic Concept
-############
-This paper introduces the basic concepts of Paddle:
- `Guide to Fluid Programming <./programming_guide/programming_guide_en.html>`_ :introduces the basic concept and usage of Paddle.
- `LoD-Tensor User Guide <lod_tensor_en.html>`_ : LoD-Tensor is a high-level feature of Paddle. It adds sequence information on the basis of tensor and supports processing variable length data.
-..  toctree::
-    :hidden:
-    programming_guide/programming_guide_en.md
-    lod_tensor_en.rst
--- a/doc/fluid/beginners_guide/basic_concept/lod_tensor.rst
+++ b/doc/fluid/beginners_guide/basic_concept/lod_tensor.rst
-.. _cn_user_guide_lod_tensor:
-=========
-LoDTensor
-=========
-LoD(Level-of-Detail) Tensor是Paddle的高级特性，是对Tensor的一种扩充。LoDTensor通过牺牲灵活性来提升训练的效率。
-**注：对于大部分用户来说，无需关注LoDTensor的用法。**
-变长序列的解决方案
-================
-现在主流的训练框架都采用batch的训练方式，即一个batch中包含多个样本。在nlp的任务中，一个batch中包含N个句子，句子的长度可能会不一致，为了解决这种长度不一致问题，Paddle提供了两种解决方案：1）padding，即在句子的结尾（或开头）添加padding id（建议的方式）；2）LoDTensor，tensor中同时保存序列的长度信息。
-对于padding的方式，会增加框架的计算量，但是对于大部分nlp任务，可以通过分桶、排序等机制，使得一个batch内的句子长度尽可能接近、能够降低padding id的比例，padding对于训练的计算量影响可以忽略。而且可以通过引入mask（记录哪些位置是padding id）信息，来移除padding id对于训练效果的影响。
-但是对于一部分nlp任务来说，一个batch内的句子长度无法做到接近，比如聊天任务，需要计算query和多个答案之间的相似度，答案必须在一个batch中，这些答案的长度差异可能会非常大，最长的几百个token，最短的10几个token，如果采用padding的方式，计算量会增加几十倍，这种场景非常适合LoDTensor。LoDTensor存储了样本的长度信息，不需要增加padding的词，能给大幅减少计算量，从而提高训练的速度。
-LoDTensor将长度不一致的维度拼接为一个大的维度，并引入了一个索引数据结构（LoD）来将张量分割成序列。LoDTensor进行了维度拼接之后，rank大小和之前padding的方式不一致，在一些运算（如dot attention）逻辑比padding方式要复杂。
-**注：如果训练样本无法通过排序、分桶等手段，使得一个batch内的样本的长度非常接近，推荐用户使用LoDTensor；其他情况下，建议用户使用padding的组网方式。**
-LoD 索引
-===========
-为了更好的理解LoD的概念，本节提供了几个例子供您参考：
-**句子组成的 mini-batch**
-假设一个mini-batch中有3个句子，每个句子中分别包含3个、1个和2个单词。我们可以用(3+1+2)xD维Tensor 加上一些索引信息来表示这个mini-batch:
-.. code-block :: text
-  3       1   2
-  | | |   |   | |
-上述表示中，每一个 :code:`|` 代表一个D维的词向量，数字3，1，2构成了 1-level LoD。
-**递归序列**
-让我们来看另一个2-level LoD-Tensor的例子：假设存在一个mini-batch中包含3个句子、1个句子和2个句子的文章，每个句子都由不同数量的单词组成，则这个mini-batch的样式可以看作：
-.. code-block:: text
-  3            1 2
-  3   2  4     1 2  3
-  ||| || ||||  | || |||
-表示的LoD信息为：
-.. code-block:: text
-  [[3，1，2]/*level=0*/，[3，2，4，1，2，3]/*level=1*/]
-**视频的mini-batch**
-在视觉任务中，时常需要处理视频和图像这些元素是高维的对象，假设现存的一个mini-batch包含3个视频，分别有3个，1个和2个帧，每个帧都具有相同大小：640x480，则这个mini-batch可以被表示为：
-.. code-block:: text
-  3     1  2
-  口口口 口 口口
-最底层tensor大小为（3+1+2）x640x480，每一个 :code:`口` 表示一个640x480的图像
-**图像的mini-batch**
-在传统的情况下，比如有N个固定大小的图像的mini-batch，LoD-Tensor表示为:
-.. code-block:: text
-  1 1 1 1     1
-  口口口口 ... 口
-在这种情况下，我们不会因为索引值都为1而忽略信息，仅仅把LoD-Tensor看作是一个普通的张量:
-.. code-block:: text
-  口口口口 ... 口
-**模型参数**
-模型参数只是一个普通的张量，在Fluid中它们被表示为一个0-level LoD-Tensor。
-LoDTensor的偏移表示
-=====================
-为了快速访问基本序列，Fluid提供了一种偏移表示的方法——保存序列的开始和结束元素，而不是保存长度。
-在上述例子中，您可以计算基本元素的长度：
-.. code-block:: text
-  3 2 4 1 2 3
-将其转换为偏移表示：
-.. code-block:: text
-  0  3  5   9   10  12   15
-     =  =   =   =   =    =
-     3  2+3 4+5 1+9 2+10 3+12
-所以我们知道第一个句子是从单词0到单词3，第二个句子是从单词3到单词5。
-类似的，LoD的顶层长度
-.. code-block:: text
-  3 1 2
-可以被转化成偏移形式：
-.. code-block:: text
-  0 3 4   6
-    = =   =
-    3 3+1 4+2
-因此该LoD-Tensor的偏移表示为：
-.. code-block:: text
-  0       3    4      6
-    3 5 9   10   12 15
-LoD-Tensor
-=============
-一个LoD-Tensor可以被看作是一个树的结构，树叶是基本的序列元素，树枝作为基本元素的标识。
-在 Fluid 中 LoD-Tensor 的序列信息有两种表述形式：原始长度和偏移量。在 Paddle 内部采用偏移量的形式表述 LoD-Tensor，以获得更快的序列访问速度；在 python API中采用原始长度的形式表述 LoD-Tensor 方便用户理解和计算，并将原始长度称为： :code:`recursive_sequence_lengths` 。
-以上文提到的一个2-level LoD-Tensor为例：
-.. code-block:: text
-  3           1  2
-  3   2  4    1  2  3
-  ||| || |||| |  || |||
- 以偏移量表示此 LoD-Tensor:[ [0,3,4,6] , [0,3,5,9,10,12,15] ]，
- 以原始长度表达此 Lod-Tensor：recursive_sequence_lengths=[ [3-0 , 4-3 , 6-4] , [3-0 , 5-3 , 9-5 , 10-9 , 12-10 , 15-12] ]。
-以文字序列为例： [3,1,2] 可以表示这个mini-batch中有3篇文章，每篇文章分别有3、1、2个句子，[3,2,4,1,2,3] 表示每个句子中分别含有3、2、4、1、2、3个字。
-recursive_seq_lens 是一个双层嵌套列表，也就是列表的列表，最外层列表的size表示嵌套的层数，也就是lod-level的大小；内部的每个列表，对应表示每个lod-level下，每个元素的大小。
-下面三段代码分别介绍如何创建一个LoD-Tensor，如何将LoD-Tensor转换成Tensor，如何将Tensor转换成LoD-Tensor：
-* 创建 LoD-Tensor
-.. code-block:: python
-  #创建lod-tensor
-  import paddle.fluid as fluid
-  import numpy as np
-  a = fluid.create_lod_tensor(np.array([[1],[1],[1],
-                                    [1],[1],
-                                    [1],[1],[1],[1],
-                                    [1],
-                                    [1],[1],
-                                    [1],[1],[1]]).astype('int64') ,
-                            [[3,1,2] , [3,2,4,1,2,3]],
-                            fluid.CPUPlace())
-  #查看lod-tensor嵌套层数
-  print (len(a.recursive_sequence_lengths()))
-  # output：2
-  #查看最基础元素个数
-  print (sum(a.recursive_sequence_lengths()[-1]))
-  # output:15 (3+2+4+1+2+3=15)
-* LoD-Tensor 转 Tensor
-.. code-block:: python
-  import paddle.fluid as fluid
-  import numpy as np
-  # 创建一个 LoD-Tensor
-  a = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], fluid.CPUPlace())
-  def LodTensor_to_Tensor(lod_tensor):
-    # 获取 LoD-Tensor 的 lod 信息
-    lod = lod_tensor.lod()
-    # 转换成 array
-    array = np.array(lod_tensor)
-    new_array = []
-    # 依照原LoD-Tensor的层级信息，转换成Tensor
-    for i in range(len(lod[0]) - 1):
-        new_array.append(array[lod[0][i]:lod[0][i + 1]])
-    return new_array
-  new_array = LodTensor_to_Tensor(a)
-  # 输出结果
-  print(new_array)
-* Tensor 转 LoD-Tensor
-.. code-block:: python
-  import paddle.fluid as fluid
-  import numpy as np
-  def to_lodtensor(data, place):
-    # 存储Tensor的长度作为LoD信息
-    seq_lens = [len(seq) for seq in data]
-    cur_len = 0
-    lod = [cur_len]
-    for l in seq_lens:
-        cur_len += l
-        lod.append(cur_len)
-    # 对待转换的 Tensor 降维
-    flattened_data = np.concatenate(data, axis=0).astype("float32")
-    flattened_data = flattened_data.reshape([len(flattened_data), 1])
-    # 为 Tensor 数据添加lod信息
-    res = fluid.LoDTensor()
-    res.set(flattened_data, place)
-    res.set_lod([lod])
-    return res
-  # new_array 为上段代码中转换的Tensor
-  lod_tensor = to_lodtensor(new_array,fluid.CPUPlace())
-  # 输出 LoD 信息
-  print("The LoD of the result: {}.".format(lod_tensor.lod()))
-  # 检验与原Tensor数据是否一致
-  print("The array : {}.".format(np.array(lod_tensor)))
-代码示例
-===========
-本节代码将根据指定的级别y-lod，扩充输入变量x。本例综合了LoD-Tensor的多个重要概念，跟随代码实现，您将：
-  直观理解Fluid中 :code:`fluid.layers.sequence_expand` 的实现过程
-  掌握如何在Fluid中创建LoD-Tensor
-  学习如何打印LoDTensor内容
-**定义计算过程**
-layers.sequence_expand通过获取 y 的 lod 值对 x 的数据进行扩充，关于 :code:`fluid.layers.sequence_expand` 的功能说明，请先阅读 :ref:`cn_api_fluid_layers_sequence_expand` 。
-序列扩充代码实现：
-.. code-block:: python
-  x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
-  y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
-  out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
-*说明*：输出LoD-Tensor的维度仅与传入的真实数据维度有关，在定义网络结构阶段为x、y设置的shape值，仅作为占位，并不影响结果。
-**创建Executor**
-.. code-block:: python
-  place = fluid.CPUPlace()
-  exe = fluid.Executor(place)
-  exe.run(fluid.default_startup_program())
-**准备数据**
-这里我们调用 :code:`fluid.create_lod_tensor` 创建 :code:`sequence_expand` 的输入数据，通过定义 y_d 的 LoD 值，对 x_d 进行扩充。其中，输出值只与 y_d 的 LoD 值有关，y_d 的 data 值在这里并不参与计算，维度上与LoD[-1]一致即可。
-:code:`fluid.create_lod_tensor()` 的使用说明请参考 :ref:`cn_api_fluid_create_lod_tensor` 。
-实现代码如下：
-.. code-block:: python
-  x_d = fluid.create_lod_tensor(np.array([[1.1],[2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
-  y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [2,1,2,1]],place)
-**执行运算**
-在Fluid中，LoD>1的Tensor与其他类型的数据一样，使用 :code:`feed` 定义数据传入顺序。此外，由于输出results是带有LoD信息的Tensor，需在exe.run( )中添加 :code:`return_numpy=False` 参数，获得LoD-Tensor的输出结果。
-.. code-block:: python
-  results = exe.run(fluid.default_main_program(),
-                    feed={'x':x_d, 'y': y_d },
-                    fetch_list=[out],return_numpy=False)
-**查看LoDTensor结果**
-由于LoDTensor的特殊属性，无法直接print查看内容，常用操作时将LoD-Tensor作为网络的输出fetch出来，然后执行 numpy.array(lod_tensor), 就能转成numpy array：
-.. code-block:: python
-  np.array(results[0])
-输出结果为：
-.. code-block:: text
-  array([[1.1],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4]])
-**查看序列长度**
-可以通过查看序列长度得到 LoDTensor 的递归序列长度：
-.. code-block:: python
-    results[0].recursive_sequence_lengths()
-输出结果为：
-.. code-block:: text
-    [[1L, 3L, 3L, 3L]]
-**完整代码**
-您可以运行下列完整代码，观察输出结果：
-.. code-block:: python
-    #加载库
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    #定义前向计算
-    x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
-    y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
-    out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
-    #定义运算场所
-    place = fluid.CPUPlace()
-    #创建执行器
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    #创建LoDTensor
-    x_d = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
-    y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [1,2,1,2]], place)
-    #开始计算
-    results = exe.run(fluid.default_main_program(),
-                      feed={'x':x_d, 'y': y_d },
-                      fetch_list=[out],return_numpy=False)
-    #输出执行结果
-    print("The data of the result: {}.".format(np.array(results[0])))
-    #输出 result 的序列长度
-    print("The recursive sequence lengths of the result: {}.".format(results[0].recursive_sequence_lengths()))
-    #输出 result 的 LoD
-    print("The LoD of the result: {}.".format(results[0].lod()))
-FAQ：
-=======
-问：如何打印variable的lod 信息
-答：
-1. 可以使用 `executor.run` 将你需要查看的 `variable`  fetch 出来，然后打印其 lod 信息，注意运行时设置 `executor.run` 方法的 `return_numpy` 参数为 `False`。
-  .. code-block:: python
-      results = exe.run(fluid.default_main_program(),
-                    feed={'x':x_d, 'y': y_d },
-                    fetch_list=[out],return_numpy=False)
-      lod_tensor = results[0]
-      print (lod_tensor.lod())
-2. 可以使用fluid.layers.Print()
-  .. code-block:: python
-      y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
-      fluid.layers.Print(y)
-总结
-========
-至此，相信您已经基本掌握了LoD-Tensor的概念，尝试修改上述代码中的 x_d 与 y_d，观察输出结果，有助于您更好的理解这一灵活的结构。
-更多LoDTensor的模型应用，可以参考新手入门中的 `词向量 <../../../beginners_guide/basics/word2vec/index.html>`_ 、`个性化推荐 <../../../beginners_guide/basics/recommender_system/index.html>`_、`情感分析 <../../../beginners_guide/basics/understand_sentiment/index.html>`_ 等指导教程。
-更高阶的应用案例，请参考 `模型库 <../../../user_guides/models/index_cn.html>`_ 中的相关内容。
--- a/doc/fluid/beginners_guide/basic_concept/lod_tensor_en.rst
+++ b/doc/fluid/beginners_guide/basic_concept/lod_tensor_en.rst
-.. _user_guide_lod_tensor:
-#####################
-LoD-Tensor User Guide
-#####################
-LoD(Level-of-Detail) Tensor is a unique term in Fluid, which can be constructed by appending sequence information to Tensor. Data transferred in Fluid contain input, output and learnable parameters of the network, all of which are represented by LoD-Tensor.
-With the help of this user guide, you will learn the design idea of LoD-Tensor in Fluid so that you can use such a data type more flexibly.
-Challenge of variable-length sequences
-======================================
-In most deep learning frameworks, a mini-batch is represented by Tensor.
-For example, if there are 10 pictures in a mini-batch and the size of each picture is 32*32, the mini-batch will be a 10*32*32 Tensor.
-Or in the NLP task, there are N sentences in a mini-batch and the length of each sentence is L. Every word is represented by a one-hot vector with D dimensions. Then the mini-batch can be represented by an N*L*D Tensor.
-In the two examples above, the size of each sequence element remains the same. However, the data to be trained are variable-length sequences in many cases. For this scenario, method to be taken in most frameworks is to set a fixed length and sequence data shorter than the fixed length will be padded with 0 to reach the fixed length.
-Owing to the LoD-Tensor in Fluid, it is not necessary to keep the lengths of sequence data in every mini-batch constant.Therefore tasks sensitive to sequence formats like NLP can also be finished without padding.
-Index Data Structure (LoD) is introduced to Fluid to split Tensor into sequences.
-Index Structure - LoD 
-======================
-To have a better understanding of the concept of LoD, you can refer to the examples in this section.
-**mini-batch consisting of sentences**
-Suppose a mini-batch contains three sentences, and each contains 3, 1, 2 words respectively. Then the mini-batch can be represented by a (3+1+2)*D Tensor with some index information appended:
-.. code-block :: text
-  3       1   2
-  | | |   |   | |
-In the text above, each :code:`|` represents a word vector with D dimension and a 1-level LoD is made up of digits 3,1,2 .
-**recursive sequence**
-Take a 2-level LoD-Tensor for example, a mini-batch contains articles of 3 sentences, 1 sentence and 2 sentences. The number of words in every sentence is different. Then the mini-batch is formed as follows:
-.. code-block:: text
-  3            1 2
-  3   2  4     1 2  3
-  ||| || ||||  | || |||
-the LoD to express the format:
-.. code-block:: text
-  [[3，1，2]/*level=0*/，[3，2，4，1，2，3]/*level=1*/]
-**mini-batch consisting of video data**
-In the task of computer vision, it usually needs to deal objects with high dimension like videos and pictures. Suppose a mini-batch contains 3 videos, which is composed of 3 frames, 1 frames, 2 frames respectively. The size of each frame is 640*480. Then the mini-batch can be described as:
-.. code-block:: text
-  3     1  2
-  口口口 口 口口
-The size of the tensor at the bottom is (3+1+2)*640*480. Every :code:`口` represents a 640*480 picture.
-**mini-batch consisting of pictures**
-Traditionally, for a mini-batch of N pictures with fixed size, LoD-Tensor is described as:
-.. code-block:: text
-  1 1 1 1     1
-  口口口口 ... 口
-Under such circumstance, we will consider LoD-Tensor as a common tensor instead of ignoring information because of the indices of all elements are 1.
-.. code-block:: text
-  口口口口 ... 口
-**model parameter**
-model parameter is a common tensor which is described as a 0-level LoD-Tensor in Fluid.
-LoDTensor expressed by offset
-=============================
-To have a quick access to the original sequence, you can take the offset expression method——store the first and last element of a sequence instead of its length.
-In the example above, you can compute the length of fundamental elements:
-.. code-block:: text
-  3 2 4 1 2 3
-It is expressed by offset as follows:
-.. code-block:: text
-  0  3  5   9   10  12   15
-     =  =   =   =   =    =
-     3  2+3 4+5 1+9 2+10 3+12
-Therefore we infer that the first sentence starts from word 0 to word 3 and the second sentence starts from word 3 to word 5.
-Similarly, for the length of the top layer of LoD
-.. code-block:: text
-  3 1 2
-It can be expressed by offset:
-.. code-block:: text
-  0 3 4   6
-    = =   =
-    3 3+1 4+2
-Therefore the LoD-Tensor is expressed by offset:
-.. code-block:: text
-  0       3    4      6
-    3 5 9   10   12 15
-LoD-Tensor
-=============
-A LoD-Tensor can be regarded as a tree of which the leaf is an original sequence element and branch is the flag of fundamental element.
-There are two ways to express sequence information of LoD-Tensor in Fluid: primitive length and offset. LoD-Tensor is expressed by offset in Paddle to offer a quicker access to sequence;LoD-Tensor is expressed by primitive length in python API to make user understand and compute more easily. The primary length is named as  :code:`recursive_sequence_lengths` .
-Take a 2-level LoD-Tensor mentioned above as an example:
-.. code-block:: text
-  3           1  2
-  3   2  4    1  2  3
-  ||| || |||| |  || |||
- LoD-Tensor expressed by offset: [ [0,3,4,6] , [0,3,5,9,10,12,15] ]
- LoD-Tensor expressed by primitive length: recursive_sequence_lengths=[ [3-0 , 4-3 , 6-4] , [3-0 , 5-3 , 9-5 , 10-9 , 12-10 , 15-12] ]
-Take text sequence as an example,[3,1,2] indicates there are 3 articles in the mini-batch,which contains 3,1,2 sentences respectively.[3,2,4,1,2,3] indicates there are 3,2,4,1,2,3 words in sentences respectively.
-recursive_seq_lens is a double Layer nested list, and in other words, the element of the list is list. The size of the outermost list represents the nested layers, namely the size of lod-level; Each inner list represents the size of each element in each lod-level. 
-The following three pieces of codes introduce how to create LoD-Tensor, how to transform LoD-Tensor to Tensor and how to transform Tensor to LoD-Tensor respectively:
-  * Create LoD-Tensor
-.. code-block:: python
-  #Create lod-tensor
-  import paddle.fluid as fluid
-  import numpy as np
-  a = fluid.create_lod_tensor(np.array([[1],[1],[1],
-                                    [1],[1],
-                                    [1],[1],[1],[1],
-                                    [1],
-                                    [1],[1],
-                                    [1],[1],[1]]).astype('int64') ,
-                            [[3,1,2] , [3,2,4,1,2,3]],
-                            fluid.CPUPlace())
-  #Check lod-tensor nested layers
-  print (len(a.recursive_sequence_lengths()))
-  # output：2
-  #Check the number of the most fundamental elements
-  print (sum(a.recursive_sequence_lengths()[-1]))
-  # output:15 (3+2+4+1+2+3=15)
-* Transform LoD-Tensor to Tensor
-  .. code-block:: python
-   import paddle.fluid as fluid
-   import numpy as np
-   # create LoD-Tensor
-   a = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], fluid.CPUPlace())
-   def LodTensor_to_Tensor(lod_tensor):
-     # get lod information of LoD-Tensor
-     lod = lod_tensor.lod()
-     # transform into array
-     array = np.array(lod_tensor)
-     new_array = []
-     # transform to Tensor according to the layer information of the original LoD-Tensor
-     for i in range(len(lod[0]) - 1):
-         new_array.append(array[lod[0][i]:lod[0][i + 1]])
-     return new_array
-   new_array = LodTensor_to_Tensor(a)
-   # output the result
-   print(new_array)
- * Transform Tensor to LoD-Tensor
-  .. code-block:: python
-   import paddle.fluid as fluid
-   import numpy as np
-   def to_lodtensor(data, place):
-     # save the length of Tensor as LoD information
-     seq_lens = [len(seq) for seq in data]
-     cur_len = 0
-     lod = [cur_len]
-     for l in seq_lens:
-         cur_len += l
-         lod.append(cur_len)
-     # decrease the dimention of transformed Tensor
-     flattened_data = np.concatenate(data, axis=0).astype("float32")
-     flattened_data = flattened_data.reshape([len(flattened_data), 1])
-     # add lod information to Tensor data
-     res = fluid.LoDTensor()
-     res.set(flattened_data, place)
-     res.set_lod([lod])
-     return res
-   # new_array is the transformed Tensor above
-   lod_tensor = to_lodtensor(new_array,fluid.CPUPlace())
-   # output LoD information
-   print("The LoD of the result: {}.".format(lod_tensor.lod()))
-   # examine the consistency with Tensor data
-   print("The array : {}.".format(np.array(lod_tensor)))
-Code examples
-==============
-Input variable x is expanded according to specified layer level y-lod in the code example in this section. The example below contains some fundamental conception of LoD-Tensor. By following the code, you will
-  Have a direct understanding of the implementation of :code:`fluid.layers.sequence_expand` in Fluid
-  Know how to create LoD-Tensor in Fluid
-  Learn how to print the content of LoDTensor
-**Define the Process of Computing**
-layers.sequence_expand expands x by obtaining the lod value of y. About more explanation of :code:`fluid.layers.sequence_expand` , please read :ref:`api_fluid_layers_sequence_expand` first. 
-Code of sequence expanding:
-.. code-block:: python
-  x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
-  y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
-  out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
-*Note*：The dimension of input LoD-Tensor is only associated with the dimension of real data transferred in. The shape value set for x and y in the definition of network structure is just a placeholder with little influence on the result.  
-**Create Executor**
-.. code-block:: python
-  place = fluid.CPUPlace()
-  exe = fluid.Executor(place)
-  exe.run(fluid.default_startup_program())
-**Prepare Data**
-Here we use :code:`fluid.create_lod_tensor` to create the input data of :code:`sequence_expand` and expand x_d by defining LoD of y_d. The output value is only associated with LoD of y_d. And the data of y_d is not invovled in the process of computation. The dimension of y_d must keep consistent with as its LoD[-1] .
-About the user guide of :code:`fluid.create_lod_tensor()` , please refer to :ref:`api_fluid_create_lod_tensor` .
-Code：
-.. code-block:: python
-  x_d = fluid.create_lod_tensor(np.array([[1.1],[2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
-  y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [2,1,2,1]],place)
-**Execute Computing**
-For tensor whose LoD > 1 in Fluid, like data of other types, the order of transfering data is defined by :code:`feed` . In addition, parameter :code:`return_numpy=False` needs to be added to exe.run() to get the output of LoD-Tensor because results are Tensors with LoD information.
-.. code-block:: python
-  results = exe.run(fluid.default_main_program(),
-                    feed={'x':x_d, 'y': y_d },
-                    fetch_list=[out],return_numpy=False)
-**Check the result of LodTensor**
-Because of the special attributes of LoDTensor, you could not print to check the content. The usual solution to the problem is to fetch the LoDTensor as the output of network and then execute  numpy.array(lod_tensor) to transfer LoDTensor into numpy array: 
-.. code-block:: python
-  np.array(results[0])
-Output:
-.. code-block:: text
-  array([[1.1],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4]])
-**Check the length of sequence**
-You can get the recursive sequence length of LoDTensor by checking the sequence length:
-.. code-block:: python
-    results[0].recursive_sequence_lengths()
-Output
-.. code-block:: text
-    [[1L, 3L, 3L, 3L]]
-**Complete Code**
-You can check the output by executing the following complete code:
-.. code-block:: python
-    #Load 
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    #Define forward computation
-    x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
-    y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
-    out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
-    #Define place for computation
-    place = fluid.CPUPlace()
-    #Create executer
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    #Create LoDTensor
-    x_d = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
-    y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [1,2,1,2]], place)
-    #Start computing
-    results = exe.run(fluid.default_main_program(),
-                      feed={'x':x_d, 'y': y_d },
-                      fetch_list=[out],return_numpy=False)
-    #Output result
-    print("The data of the result: {}.".format(np.array(results[0])))
-    #print the length of sequence of result
-    print("The recursive sequence lengths of the result: {}.".format(results[0].recursive_sequence_lengths()))
-    #print the LoD of result
-    print("The LoD of the result: {}.".format(results[0].lod()))
-Summary
-========
-Then, we believe that you have known about the concept LoD-Tensor. And an attempt to change x_d and y_d in code above and then to check the output may help you get a better understanding of this flexible structure.
-About more model applications of LoDTensor, you can refer to `Word2vec <../../../beginners_guide/basics/word2vec/index_en.html>`_ , `Individual Recommendation <../../../beginners_guide/basics/recommender_system/index_en.html>`_ , `Sentiment Analysis <../../../beginners_guide/basics/understand_sentiment/index_en.html>`_ in the beginner's guide. 
-About more difffiult and complex examples of application, please refer to associated information about `models <../../../user_guides/models/index_en.html>`_ .
--- a/doc/fluid/beginners_guide/basic_concept/operator.rst
+++ b/doc/fluid/beginners_guide/basic_concept/operator.rst
-.. _cn_user_guide_Operator:
-=======
-Operator
-=======
-在飞桨（PaddlePaddle，以下简称Paddle）中，所有对数据的操作都由Operator表示
-为了便于用户使用，在Python端，Paddle中的Operator被封装入 :code:`paddle.fluid.layers` ， :code:`paddle.fluid.nets` 等模块。
-因为一些常见的对Tensor的操作可能是由更多基础操作构成，为了提高使用的便利性，框架内部对基础 Operator 进行了一些封装，包括创建 Operator 依赖可学习参数，可学习参数的初始化细节等，减少用户重复开发的成本。
-例如用户可以利用 :code:`paddle.fluid.layers.elementwise_add()` 实现两个输入Tensor的加法运算：
-.. code-block:: python
-    import paddle.fluid as fluid
-    import numpy
-    a = fluid.data(name="a",shape=[1],dtype='float32')
-    b = fluid.data(name="b",shape=[1],dtype='float32')
-    result = fluid.layers.elementwise_add(a,b)
-    # 定义执行器，并且制定执行的设备为CPU
-    cpu = fluid.core.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    x = numpy.array([5]).astype("float32")
-    y = numpy.array([7]).astype("float32")
-    outs = exe.run(
-            feed={'a':x,'b':y},
-            fetch_list=[result])
-    # 打印输出结果，[array([12.], dtype=float32)]
-    print( outs )
-如果想获取网络执行过程中的a，b的具体值，可以将希望查看的变量添加在fetch_list中。
-.. code-block:: python
-    #执行计算
-    outs = exe.run(
-        feed={'a':x,'b':y},
-        fetch_list=[a,b,result])
-    #查看输出结果
-    print( outs )
-输出结果：
-.. code-block:: python
-    [array([5.], dtype=float32), array([7.], dtype=float32), array([12.], dtype=float32)]
--- a/doc/fluid/beginners_guide/basic_concept/program.rst
+++ b/doc/fluid/beginners_guide/basic_concept/program.rst
-.. _cn_user_guide_Program:
-=======
-Program
-=======
-飞桨（PaddlePaddle，以下简称Paddle）用Program的形式动态描述整个计算过程。这种描述方式，兼具网络结构修改的灵活性和模型搭建的便捷性，在保证性能的同时极大地提高了框架对模型的表达能力。
-用户定义Operator会被顺序的放入Program中，在网络搭建过程中，由于不能使用python 的控制流，Paddle通过同时提供分支和循环两类控制流op结构的支持，让用户可以通过组合描述任意复杂的模型。
-**顺序执行：**
-用户可以使用顺序执行的方式搭建网络：
-.. code-block:: python
-    x = fluid.data(name='x',shape=[None, 13], dtype='float32')
-    y_predict = fluid.layers.fc(input=x, size=1, act=None)
-    y = fluid.layers.data(name='y', shape=[1], dtype='float32')
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-**条件分支——switch、if else：**
-Fluid 中有 switch 和 if-else 类来实现条件选择，用户可以使用这一执行结构在学习率调节器中调整学习率或其他希望的操作：
-.. code-block:: python
-    lr = fluid.layers.tensor.create_global_var(
-            shape=[1],
-            value=0.0,
-            dtype='float32',
-            persistable=True,
-            name="learning_rate")
-    one_var = fluid.layers.fill_constant(
-            shape=[1], dtype='float32', value=1.0)
-    two_var = fluid.layers.fill_constant(
-            shape=[1], dtype='float32', value=2.0)
-    with fluid.layers.control_flow.Switch() as switch:
-        with switch.case(global_step == zero_var):
-            fluid.layers.tensor.assign(input=one_var, output=lr)
-        with switch.default():
-            fluid.layers.tensor.assign(input=two_var, output=lr)
-关于 Padldle 中 Program 的详细设计思想，可以参考阅读 `Fluid设计思想 <../../advanced_guide/addon_development/design_idea/fluid_design_idea.html>`_ 。
-更多 Paddle 中的控制流，可以参考阅读 `API文档 <../../../api_guides/low_level/layers/control_flow.html>`_ 。
--- a/doc/fluid/beginners_guide/basic_concept/programming_guide/programming_guide.md
+++ b/doc/fluid/beginners_guide/basic_concept/programming_guide/programming_guide.md
-# 编程指南
-目前飞桨（PaddlePaddle，以下简称Paddle）已经同时支持命令式编程模式(动态图)和声明式编程模式(静态图)两种编程方式，
-本文主要侧重于介绍声明式编程模式的编程方法，关于命令式编程模式编程方法，请参考[命令式编程模式机制-DyGraph](../dygraph/DyGraph.html)。
-阅读完本文档，您将了解在Paddle声明式编程模式编程方式中，如何表示和定义数据变量，以及如何完整的组建一个深度学习网络并进行训练。
-## 数据的表示和定义
-Paddle和其他主流框架一样，使用Tensor数据结构来承载数据，包括模型中的可学习参数（如网络权重、偏置等），
-网络中每一层的输入输出数据，常量数据等。
-Tensor可以简单理解成一个多维数组，一般而言可以有任意多的维度。
-不同的Tensor可以具有自己的数据类型和形状，同一Tensor中每个元素的数据类型是一样的，
-Tensor的形状就是Tensor的维度。关于Tensor的详细介绍请参阅：[Tensor](../tensor.html) 。
-在Paddle中我们使用 `fluid.data` 来创建数据变量， `fluid.data` 需要指定Tensor的形状信息和数据类型，
-当遇到无法确定的维度时，可以将相应维度指定为None，如下面的代码片段所示：
-```python
-import paddle.fluid as fluid
-# 定义一个数据类型为int64的二维数据变量x，x第一维的维度为3，第二个维度未知，要在程序执行过程中才能确定，因此x的形状可以指定为[3, None]
-x = fluid.data(name="x", shape=[3, None], dtype="int64")
-# 大多数网络都会采用batch方式进行数据组织，batch大小在定义时不确定，因此batch所在维度（通常是第一维）可以指定为None
-batched_x = fluid.data(name="batched_x", shape=[None, 3, None], dtype='int64')
-```
-除 `fluid.data` 之外，我们还可以使用 `fluid.layers.fill_constant` 来创建常量，
-如下代码将创建一个维度为[3, 4], 数据类型为int64的Tensor，其中所有元素均为16（value参数所指定的值）。
-```python
-import paddle.fluid as fluid
-data = fluid.layers.fill_constant(shape=[3, 4], value=16, dtype='int64')
-```
-以上例子中，我们只使用了一种数据类型"int64"，即有符号64位整数数据类型，更多Paddle目前支持的数据类型请查看：[支持的数据类型](../../../advanced_guide/data_preparing/feeding_data.html#fluid)。
-需要注意的是，在声明式编程模式编程方式中，上述定义的Tensor并不具有值（即使创建常量的时候指定了value），
-它们仅表示将要执行的操作，在网络执行时（训练或者预测）才会进行真正的赋值操作，
-如您直接打印上例代码中的data将会得对其信息的描述：
-```python
-print data
-```
-输出结果:
-```
-name: "fill_constant_0.tmp_0"
-type {
-    type: LOD_TENSOR
-    lod_tensor {
-        tensor {
-            data_type: INT64
-            dims: 3
-            dims: 4
-        }
-    }
-}
-persistable: false
-```
-在网络执行过程中，获取Tensor数值有两种方式：方式一是利用 `paddle.fluid.layers.Print` 创建一个打印操作，
-打印正在访问的Tensor。方式二是将Variable添加在fetch_list中。
-方式一的代码实现如下所示：
-```python
-import paddle.fluid as fluid
-data = fluid.layers.fill_constant(shape=[3, 4], value=16, dtype='int64')
-data = fluid.layers.Print(data, message="Print data:")
-place = fluid.CPUPlace()
-exe = fluid.Executor(place)
-exe.run(fluid.default_startup_program())
-ret = exe.run()
-```
-运行时的输出结果：
-```
-1571742368    Print data:    The place is:CPUPlace
-Tensor[fill_constant_0.tmp_0]
-    shape: [3,4,]
-    dtype: x
-    data: 16,16,16,16,16,16,16,16,16,16,16,16,
-```
-方式二Fetch_list的详细过程会在后文展开描述。
-## 数据读取
-使用 `fluid.data` 创建数据变量之后，我们需要把网络执行所需要的数据读取到对应变量中，
-具体的数据准备过程，请阅读[准备数据](../../../advanced_guide/data_preparing/index_cn.html)。
-## 组建网络
-在Paddle中，数据计算类API统一称为Operator（算子），简称OP，大多数OP在 `paddle.fluid.layers` 模块中提供。
-例如用户可以利用 `paddle.fluid.layers.elementwise_add()` 实现两个输入Tensor的加法运算：
-```python
-# 定义变量
-import paddle.fluid as fluid
-a = fluid.data(name="a", shape=[None, 1], dtype='int64')
-b = fluid.data(name="b", shape=[None, 1], dtype='int64')
-# 组建网络（此处网络仅由一个操作构成，即elementwise_add）
-result = fluid.layers.elementwise_add(a,b)
-# 准备运行网络
-cpu = fluid.CPUPlace() # 定义运算设备，这里选择在CPU下训练
-exe = fluid.Executor(cpu) # 创建执行器
-exe.run(fluid.default_startup_program()) # 网络参数初始化
-# 读取输入数据
-import numpy
-data_1 = int(input("Please enter an integer: a="))
-data_2 = int(input("Please enter an integer: b="))
-x = numpy.array([[data_1]])
-y = numpy.array([[data_2]])
-# 运行网络
-outs = exe.run(
-    feed={'a':x, 'b':y}, # 将输入数据x, y分别赋值给变量a，b
-    fetch_list=[result] # 通过fetch_list参数指定需要获取的变量结果
-    )
-# 输出计算结果
-print "%d+%d=%d" % (data_1,data_2,outs[0][0])
-```
-输出结果：
-```
-Please enter an integer: a=7
-Please enter an integer: b=3
-7+3=10
-```
-本次运行时，输入a=7，b=3，得到outs=10。
-您可以复制这段代码在本地执行，根据指示输入其他数值观察计算结果。
-如果想获取网络执行过程中的a，b的具体值，可以将希望查看的变量添加在fetch_list中。
-```python
-...
-# 运行网络
-outs = exe.run(
-    feed={'a':x, 'b':y}, # 将输入数据x, y分别赋值给变量a，b
-    fetch_list=[a, b, result] # 通过fetch_list参数指定需要获取的变量结果
-    )
-# 输出计算结果
-print outs
-```
-输出结果：
-```
-[array([[7]]), array([[3]]), array([[10]])]
-```
-## 组建更加复杂的网络
-某些场景下，用户需要根据当前网络中的某些状态，来具体决定后续使用哪一种操作，或者重复执行某些操作。在命令式编程模式中，可以方便的使用Python的控制流语句（如for，if-else等）来进行条件判断，但是在声明式编程模式中，由于组网阶段并没有实际执行操作，也没有产生中间计算结果，因此无法使用Python的控制流语句来进行条件判断，为此声明式编程模式提供了多个控制流API来实现条件判断。这里以[fluid.layers.while_loop](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/while_loop_cn.html)为例来说明如何在声明式编程模式中实现条件循环的操作。
-while_loop API用于实现类似while/for的循环控制功能，使用一个callable的方法cond作为参数来表示循环的条件，只要cond的返回值为True，while_loop就会循环执行循环体body（也是一个callable的方法），直到 cond 的返回值为False。对于while_loop API的详细定义和具体说明请参考文档[fluid.layers.while_loop](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/while_loop_cn.html)。
-下面的例子中，使用while_loop API进行条件循环操作，其实现的功能相当于在python中实现如下代码：
-```python
-i = 0
-ten = 10
-while i < ten:
-    i = i + 1
-print('i =', i)
-```
-在声明式编程模式中使用while_loop API实现以上代码的逻辑：
-```python
-# 该代码要求安装飞桨1.7+版本
-# 该示例代码展示整数循环+1，循环10次，输出计数结果
-import paddle.fluid as fluid
-import paddle.fluid.layers as layers
-# 定义cond方法，作为while_loop的判断条件
-def cond(i, ten):
-    return i < ten
-# 定义body方法，作为while_loop的执行体，只要cond返回值为True，while_loop就会一直调用该方法进行计算
-# 由于在使用while_loop OP时，cond和body的参数都是由while_loop的loop_vars参数指定的，所以cond和body必须有相同数量的参数列表，因此body中虽然只需要i这个参数，但是仍然要保持参数列表个数为2，此处添加了一个dummy参数来进行"占位"
-def body(i, dummy):
-    # 计算过程是对输入参数i进行自增操作，即 i = i + 1
-    i = i + 1
-    return i, dummy
-i = layers.fill_constant(shape=[1], dtype='int64', value=0) # 循环计数器
-ten = layers.fill_constant(shape=[1], dtype='int64', value=10) # 循环次数
-out, ten = layers.while_loop(cond=cond, body=body, loop_vars=[i, ten]) # while_loop的返回值是一个tensor列表，其长度，结构，类型与loop_vars相同
-exe = fluid.Executor(fluid.CPUPlace())
-res = exe.run(fluid.default_main_program(), feed={}, fetch_list=out)
-print(res) #[array([10])]
-```
-限于篇幅，上面仅仅用一个最简单的例子来说明如何在声明式编程模式中实现循环操作。循环操作在很多应用中都有着重要作用，比如NLP中常用的Transformer模型，在解码（生成）阶段的Beam Search算法中，需要使用循环操作来进行候选的选取与生成，可以参考[Transformer](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMT/transformer)模型的实现来进一步学习while_loop在复杂场景下的用法。
-除while_loop之外，飞桨还提供fluid.layers.cond API来实现条件分支的操作，以及fluid.layers.switch_case和fluid.layers.case API来实现分支控制功能，具体用法请参考文档：[cond](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/cond_cn.html)，[switch_case](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/switch_case_cn.html)和[case](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/layers_cn/case_cn.html#case)
-## 一个完整的网络示例
-一个典型的模型通常包含4个部分，分别是：输入数据定义，搭建网络（模型前向计算逻辑），定义损失函数，以及选择优化算法。
-下面我们通过一个非常简单的数据预测网络（线性回归），来完整的展示如何使用Paddle声明式编程模式方式完成一个深度学习模型的组建和训练。
-问题描述：给定一组数据 $<X,Y>$，求解出函数 $f$，使得 $y=f(x)$，其中$X$,$Y$均为一维张量。最终网络可以依据输入$x$，准确预测出$y_{\_predict}$。
-1. 定义数据
-    假设输入数据X=[1 2 3 4]，Y=[2 4 6 8]，在网络中定义：
-    ```python
-    # 定义X数值
-    train_data=numpy.array([[1.0], [2.0], [3.0], [4.0]]).astype('float32')
-    # 定义期望预测的真实值y_true
-    y_true = numpy.array([[2.0], [4.0], [6.0], [8.0]]).astype('float32')
-    ```
-2. 搭建网络（定义前向计算逻辑）
-    接下来需要定义预测值与输入的关系，本次使用一个简单的线性回归函数进行预测：
-    ```python
-    # 定义输入数据类型
-    x = fluid.data(name="x", shape=[None, 1], dtype='float32')
-    y = fluid.data(name="y", shape=[None, 1], dtype='float32')
-    # 搭建全连接网络
-    y_predict = fluid.layers.fc(input=x, size=1, act=None)
-    ```
-3. 添加损失函数
-    完成模型搭建后，如何评估预测结果的好坏呢？我们通常在设计的网络中添加损失函数，以计算真实值与预测值的差。
-    在本例中，损失函数采用[均方差函数](https://en.wikipedia.org/wiki/Mean_squared_error)：
-    ```python
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-    avg_cost = fluid.layers.mean(cost)
-    ```
-4. 网络优化
-    确定损失函数后，可以通过前向计算得到损失值，并根据损失值对网络参数进行更新，最简单的算法是随机梯度下降法：w=w−η⋅g，由 `fluid.optimizer.SGD` 实现：
-    ```python
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
-    sgd_optimizer.minimize(avg_cost)
-    ```
-    让我们的网络训练100次，查看结果：
-    ```python
-    # 加载库
-    import paddle.fluid as fluid
-    import numpy
-    # 定义输入数据
-    train_data=numpy.array([[1.0],[2.0],[3.0],[4.0]]).astype('float32')
-    y_true = numpy.array([[2.0],[4.0],[6.0],[8.0]]).astype('float32')
-    # 组建网络
-    x = fluid.data(name="x",shape=[None, 1],dtype='float32')
-    y = fluid.data(name="y",shape=[None, 1],dtype='float32')
-    y_predict = fluid.layers.fc(input=x,size=1,act=None)
-    # 定义损失函数
-    cost = fluid.layers.square_error_cost(input=y_predict,label=y)
-    avg_cost = fluid.layers.mean(cost)
-    # 选择优化方法
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
-    sgd_optimizer.minimize(avg_cost)
-    # 网络参数初始化
-    cpu = fluid.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    # 开始训练，迭代100次
-    for i in range(100):
-        outs = exe.run(
-            feed={'x':train_data, 'y':y_true},
-            fetch_list=[y_predict, avg_cost])
-    # 输出训练结果
-    print outs
-    ```
-    输出结果:
-    ```
-    [array([[2.2075021],
-            [4.1005487],
-            [5.9935956],
-            [7.8866425]], dtype=float32), array([0.01651453], dtype=float32)]
-    ```
-    可以看到100次迭代后，预测值已经非常接近真实值了，损失值也下降到了0.0165。
-    恭喜您！已经成功完成了第一个简单网络的搭建，想尝试线性回归的进阶版——房价预测模型，请阅读：[线性回归](../../../user_guides/simple_case/fit_a_line/README.cn.html)。更多丰富的模型实例可以在[典型案例](../../../user_guides/index_cn.html)中找到。
-<a name="what_next"></a>
-## 进一步学习
-如果您已经掌握了基本操作，可以进行下一阶段的学习了：
-跟随这一教程将学习到如何对实际问题建模并使用Paddle构建模型：[配置简单的网络](../../coding_practice/configure_simple_model/index_cn.html)。
-完成网络搭建后，可以开始在单机上训练您的网络了，详细步骤请参考[单机训练](../../coding_practice/single_node.html)。
-除此之外，使用文档模块根据开发者的不同背景划分了三个学习阶段：[快速上手](../../index_cn.html)、[典型案例](../../../user_guides/index_cn.html)和[进阶指南](../../../advanced_guide/index_cn.html)。
-如果您希望阅读更多场景下的应用案例，可以参考[典型案例](../../../user_guides/index_cn.html)。已经具备深度学习基础知识的用户，也可以从[进阶指南](../../../advanced_guide/index_cn.html)开始阅读。
--- a/doc/fluid/beginners_guide/basic_concept/programming_guide/programming_guide_en.md
+++ b/doc/fluid/beginners_guide/basic_concept/programming_guide/programming_guide_en.md
-# Guide to Fluid Programming
-This document will instruct you to program and create a simple nueral network with Fluid API. From this guide, you will get the hang of:
- Core concepts of Fluid
- How to define computing process in Fluid
- How to run fluid operators with executor
- How to model practical problems logically
- How to call API（layers, datasets, loss functions, optimization methods and so on)
-Before building model, you need to figure out several core concepts of Fluid at first:
-## Express data with Tensor
-Like other mainstream frameworks, Fluid uses Tensor to hold data.
-All data transferred in neural network are Tensor which can simply be regarded as a multi-dimensional array. In general, the number of dimensions can be any. Tensor features its own data type and shape. Data type of each element in single Tensor is the same. And **the shape of Tensor** refers to the dimensions of Tensor.
-Picture below visually shows Tensor with dimension from one to six:
-<p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/FluidDoc/develop/doc/fluid/beginners_guide/image/tensor.jpg" width="400">
-</p>
-There are three special kinds of Tensor in Fluid:
-**1. Learnable parameters of models**
-The lifetime of learnable parameters (such as network weight, bias and so on) of model is equal to the time of training task. The parameters will be updated by optimization algorithms. We use Parameter, the derived class of Variable, to express parameters.
-We can create learnable parameters with `fluid.layers.create_parameter` in Fluid:
-```python
-w = fluid.layers.create_parameter(name="w",shape=[1],dtype='float32')
-```
-In general, you don't need to explicitly create learnable parameters of network. Fluid encapsulates most fundamental computing modules in common networks. Take the fully connected model as a simplest example, The codes below create connection weight(W) and bias(bias) for fully connected layer with no need to explicitly call associated APIs of Parameter.
-```python
-import paddle.fluid as fluid
-y = fluid.layers.fc(input=x, size=128, bias_attr=True)
-```
-**2. Input and Output Tensor**
-The input data of the whole neural network is also a special Tensor in which the sizes of some dimensions can not be decided at the definition time of models. Such dimensions usually includes batch size, or width and height of image when such data formats in a mini-batch are not constant. Placeholders for these uncertain dimension are necessary at the definition phase of model.
-`fluid.layers.data` is used to receive input data in Fluid, and it needs to be provided with the shape of input Tensor. When the shape is not certain, the correspondent dimension is defined as None.
-The code below exemplifies the usage of `fluid.layers.data` :
-```python
-import paddle.fluid as fluid
-#Define the dimension of x : [3,None]. What we could make sure is that the first dimension of x is 3.
-#The second dimension is unknown and can only be known at runtime.
-x = fluid.layers.data(name="x", shape=[3,None], dtype="int64")
-#batch size doesn't have to be defined explicitly.
-#Fluid will automatically assign zeroth dimension as batch size dimension and fill right number at runtime.
-a = fluid.layers.data(name="a",shape=[3,4],dtype='int64')
-#If the width and height of image are variable, we can define the width and height as None.
-#The meaning of three dimensions of shape is channel, width of image, height of image respectively.
-b = fluid.layers.data(name="image",shape=[3,None,None],dtype="float32")
-```
-dtype=“int64” indicates signed int 64 bits data. For more data types supported by Fluid, please refer to [Data types currently supported by Fluid](../../user_guides/howto/prepare_data/feeding_data_en.html#fluid).
-**3. Constant Tensor**
-`fluid.layers.fill_constant` is used to define constant Tensor in Fluid. You can define the shape, data type and value of Constant Tensor. Code is as follows:
-```python
-import paddle.fluid as fluid
-data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
-```
-Notice that the tensor defined above is not assigned with values. It merely represents the operation to perform. If you print data directly, you will get information about the description of this data:
-```python
-print data
-```
-Output:
-```
-name: "fill_constant_0.tmp_0"
-type {
-    type: LOD_TENSOR
-    lod_tensor {
-        tensor {
-            data_type: INT64
-            dims: 1
-        }
-    }
-}
-persistable: false
-```
-Specific output value will be shown at the runtime of Executor. There are two ways to get runtime Variable value. The first way is to  use `paddle.fluid.layers.Print` to create a print op that will print the tensor being accessed. The second way is to add Variable to Fetch_list.
-Code of the first way is as follows:
-```python
-import paddle.fluid as fluid
-data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
-data = fluid.layers.Print(data, message="Print data: ")
-```
-Output at  the runtime of Executor:
-```
-1563874307    Print data:     The place is:CPUPlace
-Tensor[fill_constant_0.tmp_0]
-    shape: [1,]
-    dtype: x
-    data: 0,
-```
-For more information on how to use the Print API, please refer to [Print operator](https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/control_flow.html#print).
-Detailed process of the second way Fetch_list will be explained later.
-## Feed data
-The method to feed data in Fluid:
-You need to use `fluid.layers.data` to configure data input layer and use ``executor.run(feed=...)`` to feed training data into `fluid.Executor` or `fluid.ParallelExecutor` .
-For specific preparation for data, please refer to [Preparation for data](../../../advanced_guide/data_preparing/index_en.html).
-## Operators -- operations on data
-All operations on data are achieved by Operators in Fluid.
-To facilitate development, on Python end, Operators in Fluid are further encapsulated into `paddle.fluid.layers` , `paddle.fluid.nets` and other modules.
-It is because some common operations for Tensor may be composed of many fundamental operations. To make it more convenient, fundamental Operators are encapsulated in Fluid to reduce repeated coding, including the creation of learnable parameters which Operator relies on, details about initialization of learnable parameters and so on.
-For example, you can use `paddle.fluid.layers.elementwise_add()` to add up two input Tensor:
-```python
-#Define network
-import paddle.fluid as fluid
-a = fluid.layers.data(name="a",shape=[1],dtype='float32')
-b = fluid.layers.data(name="b",shape=[1],dtype='float32')
-result = fluid.layers.elementwise_add(a,b)
-#Define Exector
-cpu = fluid.core.CPUPlace() #define computing place. Here we choose to train on CPU
-exe = fluid.Executor(cpu) #create executor
-exe.run(fluid.default_startup_program()) #initialize network parameters
-#Prepare data
-import numpy
-data_1 = int(input("Please enter an integer: a="))
-data_2 = int(input("Please enter an integer: b="))
-x = numpy.array([[data_1]])
-y = numpy.array([[data_2]])
-#Run computing
-outs = exe.run(
-feed={'a':x,'b':y},
-fetch_list=[result.name])
-#Verify result
-print "%d+%d=%d" % (data_1,data_2,outs[0][0])
-```
-Output:
-```
-a=7
-b=3
-7+3=10
-```
-At runtime, input a=7,b=3, and you will get output=10.
-You can copy the code, run it locally, input different numbers following the prompt instructions and check the computed result.
-If you want to get the specific value of a,b at the runtime of network, you can add variables you want to check into ``fetch_list`` .
-```python
-...
-#Run computing
-outs = exe.run(
-    feed={'a':x,'b':y},
-    fetch_list=[a,b,result.name]
-#Check output
-print outs
-```
-Output:
-```
-[array([[7]]), array([[3]]), array([[10]])]
-```
-## Use Program to describe neural network model
-Fluid is different from most other deep learning frameworks. In Fluid, static computing map is replaced by Program to dynamically describe the network. This dynamic method delivers both flexible modifications to network structure and convenience to build model. Moreover, the capability of expressing a model is enhanced significantly while the performance is guaranteed.
-All Operators will be written into Program, which will be automatically transformed into a descriptive language named ProgramDesc in Fluid. It's like to write a general program to define Program. If you are an experienced developer, you can naturally apply the knowledge you have acquired on Fluid programming.
-You can describe any complex model by combining sequential processes, branches and loops supported by Fluid.
-**Sequential Process**
-You can use sequential structure to build network:
-```python
-x = fluid.layers.data(name='x',shape=[13], dtype='float32')
-y_predict = fluid.layers.fc(input=x, size=1, act=None)
-y = fluid.layers.data(name='y', shape=[1], dtype='float32')
-cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-```
-**Conditional branch——switch,if else：**
-Switch and if-else class are used to implement conditional branch in Fluid. You can use the structure to adjust learning rate in learning rate adapter or perform other operations :
-```python
-lr = fluid.layers.tensor.create_global_var(
-        shape=[1],
-        value=0.0,
-        dtype='float32',
-        persistable=True,
-        name="learning_rate")
-one_var = fluid.layers.fill_constant(
-        shape=[1], dtype='float32', value=1.0)
-two_var = fluid.layers.fill_constant(
-        shape=[1], dtype='float32', value=2.0)
-with fluid.layers.control_flow.Switch() as switch:
-    with switch.case(global_step == zero_var):
-        fluid.layers.tensor.assign(input=one_var, output=lr)
-    with switch.default():
-        fluid.layers.tensor.assign(input=two_var, output=lr)
-```
-For detailed design principles of Program, please refer to [Design principle of Fluid](../../../advanced_guide/addon_development/design_idea/fluid_design_idea_en.html).
-For more about control flow in Fluid, please refer to [Control Flow](../../api/layers.html#control-flow).
-## Use Executor to run Program
-The design principle of Fluid is similar to C++, JAVA and other advanced programming language. The execution of program is divided into two steps: compile and run.
-Executor accepts the defined Program and transforms it to a real executable Fluid Program at the back-end of C++. This process performed automatically is the compilation.
-After compilation, it needs Executor to run the compiled Fluid Program.
-Take add operator above as an example, you need to create an Executor to initialize and train Program after the construction of Program:
-```python
-#define Executor
-cpu = fluid.core.CPUPlace() #define computing place. Here we choose training on CPU
-exe = fluid.Executor(cpu) #create executor
-exe.run(fluid.default_startup_program()) #initialize Program
-#train Program and start computing
-#feed defines the order of data transferred to network in the form of dict
-#fetch_list defines the output of network
-outs = exe.run(
-    feed={'a':x,'b':y},
-    fetch_list=[result.name])
-```
-## Code example
-So far, you have got a primary knowledge of core concepts in Fluid. Why not try to configure a simple network ? You can finish a very simple data prediction under the guide of the part if you are interested. If you have learned this part, you can skip this section and read [What's next](#what_next).
-Firstly, define input data format, model structure,loss function and optimized algorithm logically. Then you need to use PaddlePaddle APIs and operators to implement the logic of model. A typical model mainly contains four parts. They are: definition of input data format; forward computing logic; loss function; optimization algorithm.
-1. Problem
-    Given a pair of data $<X,Y>$，construct a function $f$ so that $y=f(x)$ . $X$ , $Y$ are both one dimensional Tensor. Network finally can predict $y_{\_predict}$ accurately according to input $x$.
-2. Define data
-    Supposing input data X=[1 2 3 4]，Y=[2,4,6,8], make a definition in network:
-    ```python
-    #define X
-    train_data=numpy.array([[1.0],[2.0],[3.0],[4.0]]).astype('float32')
-    #define ground-truth y_true expected to get from the model prediction
-    y_true = numpy.array([[2.0],[4.0],[6.0],[8.0]]).astype('float32')
-    ```
-3. Create network (define forward computing logic)
-    Next you need to define the relationship between the predicted value and the input. Take a simple linear regression function for example:
-    ```python
-    #define input data type
-    x = fluid.layers.data(name="x",shape=[1],dtype='float32')
-    #create fully connected network
-    y_predict = fluid.layers.fc(input=x,size=1,act=None)
-    ```
-    Now the network can predict output. Although the output is just a group of random numbers, which is far from expected results:
-    ```python
-    #load library
-    import paddle.fluid as fluid
-    import numpy
-    #define data
-    train_data=numpy.array([[1.0],[2.0],[3.0],[4.0]]).astype('float32')
-    y_true = numpy.array([[2.0],[4.0],[6.0],[8.0]]).astype('float32')
-    #define predict function
-    x = fluid.layers.data(name="x",shape=[1],dtype='float32')
-    y_predict = fluid.layers.fc(input=x,size=1,act=None)
-    #initialize parameters
-    cpu = fluid.core.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    #start training
-    outs = exe.run(
-        feed={'x':train_data},
-        fetch_list=[y_predict.name])
-    #observe result
-    print outs
-    ```
-    Output:
-    ```
-    [array([[0.74079144],
-               [1.4815829 ],
-               [2.2223744 ],
-               [2.9631658 ]], dtype=float32)]
-    ```
-4. Add loss function
-    After the construction of model, we need to evaluate the output result in order to make accurate predictions. How do we evaluate the result of prediction? We usually add loss function to network to compute the *distance* between ground-truth value and predict value.
-    In this example, we adopt [mean-square function](https://en.wikipedia.org/wiki/Mean_squared_error) as our loss function ：
-    ```python
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-    avg_cost = fluid.layers.mean(cost)
-    ```
-    Output predicted value and loss function after a process of computing:
-    ```python
-    #load library
-    import paddle.fluid as fluid
-    import numpy
-    #define data
-    train_data=numpy.array([[1.0],[2.0],[3.0],[4.0]]).astype('float32')
-    y_true = numpy.array([[2.0],[4.0],[6.0],[8.0]]).astype('float32')
-    #define network
-    x = fluid.layers.data(name="x",shape=[1],dtype='float32')
-    y = fluid.layers.data(name="y",shape=[1],dtype='float32')
-    y_predict = fluid.layers.fc(input=x,size=1,act=None)
-    #define loss function
-    cost = fluid.layers.square_error_cost(input=y_predict,label=y)
-    avg_cost = fluid.layers.mean(cost)
-    #initialize parameters
-    cpu = fluid.core.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    #start training
-    outs = exe.run(
-        feed={'x':train_data,'y':y_true},
-        fetch_list=[y_predict.name,avg_cost.name])
-    #observe output
-    print outs
-    ```
-    Output:
-    ```
-    [array([[0.9010564],
-        [1.8021128],
-        [2.7031693],
-        [3.6042256]], dtype=float32), array([9.057577], dtype=float32)]
-    ```
-    We discover that the loss function after the first iteration of computing is 9.0, which shows there is a great improve space.
-5. Optimization of network
-    After the definition of loss function,you can get loss value by forward computing and then get gradients of parameters with chain derivative method.
-    Parameters should be updated after you have obtained gradients. The simplest algorithm is random gradient algorithm: w=w−η⋅g,which is implemented by `fluid.optimizer.SGD`:
-    ```python
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
-    ```
-    Let's train the network for 100 times and check the results:
-    ```python
-    #load library
-    import paddle.fluid as fluid
-    import numpy
-    #define data
-    train_data=numpy.array([[1.0],[2.0],[3.0],[4.0]]).astype('float32')
-    y_true = numpy.array([[2.0],[4.0],[6.0],[8.0]]).astype('float32')
-    #define network
-    x = fluid.layers.data(name="x",shape=[1],dtype='float32')
-    y = fluid.layers.data(name="y",shape=[1],dtype='float32')
-    y_predict = fluid.layers.fc(input=x,size=1,act=None)
-    #define loss function
-    cost = fluid.layers.square_error_cost(input=y_predict,label=y)
-    avg_cost = fluid.layers.mean(cost)
-    #define optimization algorithm
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
-    sgd_optimizer.minimize(avg_cost)
-    #initialize parameters
-    cpu = fluid.core.CPUPlace()
-    exe = fluid.Executor(cpu)
-    exe.run(fluid.default_startup_program())
-    ##start training and iterate for 100 times
-    for i in range(100):
-        outs = exe.run(
-            feed={'x':train_data,'y':y_true},
-            fetch_list=[y_predict.name,avg_cost.name])
-    #observe result
-    print outs
-    ```
-    Output:
-    ```
-    [array([[2.2075021],
-            [4.1005487],
-            [5.9935956],
-            [7.8866425]], dtype=float32), array([0.01651453], dtype=float32)]
-    ```
-    Now we discover that predicted value is nearly close to real value and the loss value descends from original value 9.05 to 0.01 after iteration for 100 times.
-    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../../user_guides/simple_case/fit_a_line/README.html). More examples of model can be found in [User Guides](../../../user_guides/index_en.html).
-<a name="what_next"></a>
-## What's next
-If you have been familiar with fundamental operations, you can start your next journey to learn fluid:
-You will learn how to build model for practical problem with fluid: [The configuration of simple network](../../coding_practice/configure_simple_model/index_en.html).
-After the construction of network, you can start training your network in single node. For detailed procedures, please refer to [Single-node training](../../coding_practice/single_node_en.html).
-In addition, there are three learning levels in documentation according to developer's background and experience: [Beginner's Guide](../../index_en.html) , [User Guides](../../../user_guides/index_en.html) and [Advanced Guide](../../../advanced_guide/index_en.html).
-If you want to read examples in more application scenarios, you can go to [User Guides](../../../user_guides/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [Advanced Guide](../../../advanced_guide/index_en.html).
--- a/doc/fluid/beginners_guide/basic_concept/tensor.rst
+++ b/doc/fluid/beginners_guide/basic_concept/tensor.rst
-.. _cn_user_guide_tensor:
-=========
-Tensor
-=========
-飞桨（PaddlePaddle，以下简称Paddle）和其他框架一样，使用Tensor来表示数据。
-在神经网络中传递的数据都是Tensor。Tensor可以简单理解成一个多维数组，一般而言可以有任意多的维度。
-不同的Tensor可以具有自己的数据类型和形状，同一Tensor中每个元素的数据类型是一样的，Tensor的形状就是Tensor的维度。
-下图直观地表示1～6维的Tensor：
-.. image:: ../image/tensor.jpg
-**Paddle 高级特性**  
-:ref:`Lod-Tensor <cn_user_guide_lod_tensor>`
-对于一些任务中batch内样本大小不一致的问题，Paddle提供了两种解决方案：
-1. padding， 将大小不一致的样本padding到同样的大小，这是一种常用且推荐的使用方式；
-2. :ref:`Lod-Tensor <cn_user_guide_lod_tensor>` ，记录每一个样本的大小，减少无用的计算量，LoD 牺牲灵活性来提升性能。
-如果一个batch内的样本无法通过分桶、排序等方式使得大小接近， 建议使用 :ref:`Lod-Tensor <cn_user_guide_lod_tensor>` 。
--- a/doc/fluid/beginners_guide/basic_concept/variable.rst
+++ b/doc/fluid/beginners_guide/basic_concept/variable.rst
-.. _cn_user_guide_Variable:
-=========
-Variable
-=========
-飞桨（PaddlePaddle，以下简称Paddle）中的 :code:`Variable` 可以包含任何类型的值变量，提供的API中用到的类型是 :ref:`Tensor <cn_user_guide_tensor>` 。
-后续的文档介绍中提到的 :code:`Variable` 基本等价于 :ref:`Tensor <cn_user_guide_tensor>` （特殊的地方会标注说明）。
-在 Paddle 中存在三种 :code:`Variable`：
-**1. 模型中的可学习参数**
-模型中的可学习参数（包括网络权重、偏置等）生存期和整个训练任务一样长，会接受优化算法的更新，在 Paddle中以 Variable 的子类 Parameter 表示。
-在Paddle中可以通过 :code:`fluid.layers.create_parameter` 来创建可学习参数：
-.. code-block:: python
-    w = fluid.layers.create_parameter(name="w",shape=[1],dtype='float32')
-Paddle 为大部分常见的神经网络基本计算模块都提供了封装。以最简单的全连接模型为例，下面的代码片段会直接为全连接层创建连接权值（W）和偏置（ bias ）两个可学习参数，无需显式地调用 Parameter 相关接口来创建。
-.. code-block:: python
-    import paddle.fluid as fluid
-    y = fluid.layers.fc(input=x, size=128, bias_attr=True)
-**2. 占位 Variable**
-在声明式编程模式(静态图)模式下，组网的时候通常不知道实际输入的信息，此刻需要一个占位的 :code:`Variable`，表示一个待提供输入的 :code:`Variable`
-Paddle 中使用 :code:`fluid.data` 来接收输入数据， :code:`fluid.data` 需要提供输入 Tensor 的形状信息，当遇到无法确定的维度时，相应维度指定为 None ，如下面的代码片段所示：
-.. code-block:: python
-    import paddle.fluid as fluid
-    #定义x的维度为[3,None]，其中我们只能确定x的第一的维度为3，第二个维度未知，要在程序执行过程中才能确定
-    x = fluid.data(name="x", shape=[3,None], dtype="int64")
-    #若图片的宽度和高度在运行时可变，将宽度和高度定义为None。
-    #shape的三个维度含义分别是：batch_size, channel、图片的宽度、图片的高度
-    b = fluid.data(name="image",shape=[None, 3,None,None],dtype="float32")
-其中，dtype=“int64”表示有符号64位整数数据类型，更多Fluid目前支持的数据类型请查看： :ref:`Paddle目前支持的数据类型 <user_guide_use_numpy_array_as_train_data>` 。
-**3. 常量 Variable**
-Fluid 通过 :code:`fluid.layers.fill_constant` 来实现常量Variable，用户可以指定内部包含Tensor的形状，数据类型和常量值。代码实现如下所示：
-.. code-block:: python
-    import paddle.fluid as fluid
-    data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
--- a/doc/fluid/beginners_guide/book/fit_a_line/README.md
+++ b/doc/fluid/beginners_guide/book/fit_a_line/README.md
-../../../../external/book/01.fit_a_line/README.md
--- a/doc/fluid/beginners_guide/book/fit_a_line/image
+++ b/doc/fluid/beginners_guide/book/fit_a_line/image
-../../../../external/book/01.fit_a_line/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/fit_a_line/index.html
+++ b/doc/fluid/beginners_guide/book/fit_a_line/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/01.fit_a_line/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/image_classification/README.md
+++ b/doc/fluid/beginners_guide/book/image_classification/README.md
-../../../../external/book/03.image_classification/README.md
--- a/doc/fluid/beginners_guide/book/image_classification/image
+++ b/doc/fluid/beginners_guide/book/image_classification/image
-../../../../external/book/03.image_classification/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/image_classification/index.html
+++ b/doc/fluid/beginners_guide/book/image_classification/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/03.image_classification/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/index_en.rst
+++ b/doc/fluid/beginners_guide/book/index_en.rst
-Book
-======
-..  toctree::
-    :maxdepth: 1
-    fit_a_line/README.md
-    recognize_digits/README.md
-    image_classification/README.md
-    word2vec/README.md
-    recommender_system/README.md
-    understand_sentiment/README.md
-    label_semantic_roles/README.md
-    machine_translation/README.md
--- a/doc/fluid/beginners_guide/book/label_semantic_roles/README.md
+++ b/doc/fluid/beginners_guide/book/label_semantic_roles/README.md
-../../../../external/book/07.label_semantic_roles/README.md
--- a/doc/fluid/beginners_guide/book/label_semantic_roles/image
+++ b/doc/fluid/beginners_guide/book/label_semantic_roles/image
-../../../../external/book/07.label_semantic_roles/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/label_semantic_roles/index.html
+++ b/doc/fluid/beginners_guide/book/label_semantic_roles/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/07.label_semantic_roles/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/machine_translation/README.md
+++ b/doc/fluid/beginners_guide/book/machine_translation/README.md
-../../../../external/book/08.machine_translation/README.md
--- a/doc/fluid/beginners_guide/book/machine_translation/image
+++ b/doc/fluid/beginners_guide/book/machine_translation/image
-../../../../external/book/08.machine_translation/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/machine_translation/index.html
+++ b/doc/fluid/beginners_guide/book/machine_translation/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/08.machine_translation/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/recognize_digits/README.md
+++ b/doc/fluid/beginners_guide/book/recognize_digits/README.md
-../../../../external/book/02.recognize_digits/README.md
--- a/doc/fluid/beginners_guide/book/recognize_digits/image
+++ b/doc/fluid/beginners_guide/book/recognize_digits/image
-../../../../external/book/02.recognize_digits/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/recognize_digits/index.html
+++ b/doc/fluid/beginners_guide/book/recognize_digits/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/02.recognize_digits/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/recommender_system/README.md
+++ b/doc/fluid/beginners_guide/book/recommender_system/README.md
-../../../../external/book/05.recommender_system/README.md
--- a/doc/fluid/beginners_guide/book/recommender_system/image
+++ b/doc/fluid/beginners_guide/book/recommender_system/image
-../../../../external/book/05.recommender_system/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/recommender_system/index.html
+++ b/doc/fluid/beginners_guide/book/recommender_system/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/05.recommender_system/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/understand_sentiment/README.md
+++ b/doc/fluid/beginners_guide/book/understand_sentiment/README.md
-../../../../external/book/06.understand_sentiment/README.md
--- a/doc/fluid/beginners_guide/book/understand_sentiment/image
+++ b/doc/fluid/beginners_guide/book/understand_sentiment/image
-../../../../external/book/06.understand_sentiment/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/understand_sentiment/index.html
+++ b/doc/fluid/beginners_guide/book/understand_sentiment/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/06.understand_sentiment/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/book/word2vec/README.md
+++ b/doc/fluid/beginners_guide/book/word2vec/README.md
-../../../../external/book/04.word2vec/README.md
--- a/doc/fluid/beginners_guide/book/word2vec/images
+++ b/doc/fluid/beginners_guide/book/word2vec/images
-../../../../external/book/04.word2vec/image/
\ No newline at end of file
--- a/doc/fluid/beginners_guide/book/word2vec/index.html
+++ b/doc/fluid/beginners_guide/book/word2vec/index.html
-<html>
-<head>
-  <script type="text/x-mathjax-config">
-  MathJax.Hub.Config({
-    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
-    jax: ["input/TeX", "output/HTML-CSS"],
-    tex2jax: {
-      inlineMath: [ ['$','$'] ],
-      displayMath: [ ['$$','$$'] ],
-      processEscapes: true
-    },
-    "HTML-CSS": { availableFonts: ["TeX"] }
-  });
-  </script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
-  <script type="text/javascript" src="../.tools/theme/marked.js">
-  </script>
-  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
-  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
-  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
-  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
-  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
-</head>
-<style type="text/css" >
-.markdown-body {
-    box-sizing: border-box;
-    min-width: 200px;
-    max-width: 980px;
-    margin: 0 auto;
-    padding: 45px;
-}
-</style>
-<body>
-<div id="context" class="container-fluid markdown-body">
-</div>
-<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
-<div id="markdown" style='display:none'>
-../../../../external/book/04.word2vec/README.md
-</div>
-<!-- You can change the lines below now. -->
-<script type="text/javascript">
-marked.setOptions({
-  renderer: new marked.Renderer(),
-  gfm: true,
-  breaks: false,
-  smartypants: true,
-  highlight: function(code, lang) {
-    code = code.replace(/&amp;/g, "&")
-    code = code.replace(/&gt;/g, ">")
-    code = code.replace(/&lt;/g, "<")
-    code = code.replace(/&nbsp;/g, " ")
-    return hljs.highlightAuto(code, [lang]).value;
-  }
-});
-document.getElementById("context").innerHTML = marked(
-        document.getElementById("markdown").innerHTML)
-</script>
-</body>
--- a/doc/fluid/beginners_guide/coding_practice/configure_simple_model/index_cn.rst
+++ b/doc/fluid/beginners_guide/coding_practice/configure_simple_model/index_cn.rst
-..  _user_guide_configure_simple_model:
-##############
-配置简单的网络
-##############
-在解决实际问题时，可以先从逻辑层面对问题进行建模，明确模型所需要的 **输入数据类型**、**计算逻辑**、**求解目标** 以及 **优化算法**。PaddlePaddle提供了丰富的算子来实现模型逻辑。下面以一个简单回归任务举例说明如何使用PaddlePaddle构建模型。该例子完整代码参见 `fit_a_line <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_fit_a_line.py>`_。
-问题描述及定义
-##############
-问题描述: 给定一组数据 :math:`<X, Y>`，求解出函数 :math:`f`，使得 :math:`y=f(x)`，其中 :math:`x\subset X` 表示一条样本的特征，为 :math:`13` 维的实数向量；:math:`y \subset Y` 为一实数表示该样本对应的值。
-我们可以尝试用回归模型来对问题建模，回归问题的损失函数有很多，这里选择常用的均方误差。为简化问题，这里假定 :math:`f` 为简单的线性变换函数，同时选用随机梯度下降算法来求解模型。
-+----------------+----------------------------------------------+
-| 输入数据类型   |  样本特征: 13 维 实数                        |
-+                +----------------------------------------------+
-|                |  样本标签: 1 维 实数                         |
-+----------------+----------------------------------------------+
-| 计算逻辑       | 使用线性模型，产生 1维实数作为模型的预测输出 |
-+----------------+----------------------------------------------+
-| 求解目标       | 最小化模型预测输出与样本标签间的均方误差     |
-+----------------+----------------------------------------------+
-| 优化算法       | 随机梯度下降                                 |
-+----------------+----------------------------------------------+
-使用PaddlePaddle建模
-###################
-从逻辑层面明确了输入数据格式、模型结构、损失函数以及优化算法后，需要使用PaddlePaddle提供的API及算子来实现模型逻辑。一个典型的模型主要包含4个部分，分别是：输入数据格式定义，模型前向计算逻辑，损失函数以及优化算法。
-数据层
------
-PaddlePaddle提供了 :code:`fluid.data()` 算子来描述输入数据的格式。
-:code:`fluid.data()` 算子的输出是一个Variable。这个Variable的实际类型是Tensor。Tensor具有强大的表征能力，可以表示多维数据。为了精确描述数据结构，通常需要指定数据shape以及数值类型type。其中shape为一个整数向量，type可以是一个字符串类型。目前支持的数据类型参考    :ref:`user_guide_paddle_support_data_types` 。 模型训练一般会使用batch的方式读取数据，而batch的size在训练过程中可能不固定。data算子会依据实际数据来推断batch size，所以这里提供shape时不用关心batch size，只需关心一条样本的shape即可，更高级用法请参考 :ref:`user_guide_customize_batch_size_rank`。从上知，:math:`x` 为 :math:`13` 维的实数向量，:math:`y` 为实数，可使用下面代码定义数据层：
-.. code-block:: python
-    x = fluid.data(name='x', shape=[13], dtype='float32')
-    y = fluid.data(name='y', shape=[1], dtype='float32')
-该模型使用的数据比较简单，事实上data算子还可以描述变长的、嵌套的序列数据。更详细的文档可参照 :ref:`user_guide_prepare_data`。
-前向计算逻辑
------------
-实现一个模型最重要的部分是实现计算逻辑，PaddlePaddle提供了丰富的算子。这些算子的封装粒度不同，通常对应一种或一组变换逻辑。算子输出即为对输入数据执行变换后的结果。用户可以灵活使用算子来完成复杂的模型逻辑。比如图像相关任务中会使用较多的卷积算子、序列任务中会使用LSTM/GRU等算子。复杂模型通常会组合多种算子，以完成复杂的变换。PaddlePaddle提供了非常自然的方式来组合算子，一般地可以使用下面的方式：
-.. code-block:: python
-    op_1_out = fluid.layers.op_1(input=op_1_in, ...)
-    op_2_out = fluid.layers.op_2(input=op_1_out, ...)
-    ...
-其中op_1和op_2表示算子类型，可以是fc来执行线性变换(全连接)，也可以是conv来执行卷积变换等。通过算子的输入输出的连接来定义算子的计算顺序以及数据流方向。上面的例子中，op_1的输出是op_2的输入，那么在执行计算时，会先计算op_1，然后计算op_2。更复杂的模型可能需要使用控制流算子，依据输入数据来动态执行，针对这种情况，PaddlePaddle提供了IfElseOp和WhileOp等。算子的文档可参考 :code:`fluid.layers`。具体到这个任务, 我们使用一个fc算子：
-.. code-block:: python
-    y_predict = fluid.layers.fc(input=x, size=1, act=None)
-损失函数
--------
-损失函数对应求解目标，我们可以通过最小化损失来求解模型。大多数模型使用的损失函数，输出是一个实数值。但是PaddlePaddle提供的损失算子一般是针对一条样本计算。当输入一个batch的数据时，损失算子的输出有多个值，每个值对应一条样本的损失，所以通常会在损失算子后面使用mean等算子，来对损失做归约。模型在一次前向迭代后会得到一个损失值，PaddlePaddle会自动执行链式求导法则计算模型里面每个参数和变量对应的梯度值。这里使用均方误差损失：
-.. code-block:: python
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-    avg_cost = fluid.layers.mean(cost)
-优化方法
--------
-确定损失函数后，可以通过前向计算得到损失值，然后通过链式求导法则得到参数的梯度值。获取梯度值后需要更新参数，最简单的算法是随机梯度下降法：:math:`w=w - \eta \cdot g`。但是普通的随机梯度下降算法存在一些问题: 比如收敛不稳定等。为了改善模型的训练速度以及效果，学术界先后提出了很多优化算法，包括： :code:`Momentum`、:code:`RMSProp`、:code:`Adam` 等。这些优化算法采用不同的策略来更新模型参数，一般可以针对具体任务和具体模型来选择优化算法。不管使用何种优化算法，学习率一般是一个需要指定的比较重要的超参数，需要通过实验仔细调整。这里采用随机梯度下降算法：
-.. code-block:: python
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
-更多优化算子可以参考 :code:`fluid.optimizer()` 。
-下一步做什么？
-##############
-使用PaddlePaddle实现模型时需要关注 **数据层**、**前向计算逻辑**、**损失函数** 和 **优化方法**。不同的任务需要的数据格式不同，涉及的计算逻辑不同，损失函数不同，优化方法也不同。PaddlePaddle提供了丰富的模型示例，可以以这些示例为参考来构建自己的模型结构。用户可以访问 `模型库 <https://github.com/PaddlePaddle/models/tree/develop/fluid>`_ 查看官方提供的示例。
--- a/doc/fluid/beginners_guide/coding_practice/configure_simple_model/index_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/configure_simple_model/index_en.rst
-..  _user_guide_configure_simple_model_en:
-#######################
-Set up Simple Model
-#######################
-When solving practical problems, in the beginning you can model the problem logically, and get a clear picture of **input data type** , **computing logic** , **target solution** and **optimization algorithm** of model.
-PaddlePaddle provides abundant operators to implement logics of a model. In this article, we take a simple regression task as an example to clarify how to build model with PaddlePaddle.
-About complete code of the example,please refer to `fit_a_line <https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_fit_a_line.py>`_ 。
-Description and Definition of Problem
-######################################
-Description : Given a pair of data :math:`<X, Y>`, figure out a function :math:`f` to make :math:`y=f(x)` . :math:`x\subset X` represents the feature of a sample, which is a real number vector with 13 dimensions; :math:`y \subset Y` is a real number representing corresponding value of the given sample.
-We can try to model the problem with a regression model. Though lots of loss functions are available for regression problem, here we choose commonly used mean-square error. To simplify the problem, assuming :math:`f` is a simple linear transformation funtion, we choose random gradient descent algorithm to solve problem.
-+--------------------------+-------------------------------------------------------------------------------------+
-| input data type          |  sample feature: 13-dimension real number                                           |
-+                          +-------------------------------------------------------------------------------------+
-|                          |  sample label: 1-dimension real number                                              |
-+--------------------------+-------------------------------------------------------------------------------------+
-| computing logic          | use linear model to generate 1-dimensional real number as predicted output of model |
-+--------------------------+-------------------------------------------------------------------------------------+
-| target solution          | minimize mean-squre error between predicted output of model and sample label        |
-+--------------------------+-------------------------------------------------------------------------------------+
-| optimization algorithm   | random gradient descent                                                             |
-+--------------------------+-------------------------------------------------------------------------------------+
-Model with PaddlePaddle
-#######################
-After getting clear of the of input data format, model structure, loss function and optimization algorithm in terms of logic, you need to use PaddlePaddle API and operators to implement logic of model. A typical model includes four parts: format of input data, forward computing logic, loss function and optimization algorithm.
-Data Layer
-----------
-PaddlePaddle provides :code:`fluid.data()` to describe format of input data.
-The output of :code:`fluid.data()` is a Variable which is in fact a Tensor. Tensor can represent multi-demensional data with its great expressive feature.In order to accurately describe data structure, it is usually necessary to indicate the shape and type of data. The shape is int vector and type can be a string. About current supported data type, please refer to    :ref:`user_guide_paddle_support_data_types_en` . Data is often read in form of batch to train model. Since batch size may vary and data operator infers batch size according to actual data, here the batch size is ignored when shape is provided. It's enough to care for the shape of single sample. For more advanced usage, please refer to :ref:`user_guide_customize_batch_size_rank_en` .  :math:`x` is real number vector of :math:`13` dimenstions while :math:`y` is a real number. Data layer can be defined as follows:
-.. code-block:: python
-    x = fluid.data(name='x', shape=[13], dtype='float32')
-    y = fluid.data(name='y', shape=[1], dtype='float32')
-Data in this example model are relatively simple. In fact, data operator can describe variable-length and nested sequence data. For more detailed documentation, please refer to :ref:`user_guide_prepare_data_en` .
-Logic of Forward Computing
---------------------------
-The most important part of a model is to implement logic of computing. PaddlePaddle provides lots of operators encapsulated in different granularity. These operators usually are correspondent to a kind or a group of transformation logic. The output of operator is the result of transfomation for input data. User can flexiblely use operators to implement models with complex logics. For example, many convolutional operators will be used in tasks associated with image tasks and LSTM/GRU operators will be used in sequence tasks. Various operators are usually combined in complex models to implement complex transformation. PaddlePaddle provides natural methods to combine operators. The following example displays the typical combination method:
-.. code-block:: python
-    op_1_out = fluid.layers.op_1(input=op_1_in, ...)
-    op_2_out = fluid.layers.op_2(input=op_1_out, ...)
-    ...
-In the example above, op_1 and op_2 represent types of operators,such as fc performing linear transformation(full connection) or conv performing convolutional transformation. The computing order of operators and direction of data stream are defined by the connection of input and output of operators. In the example above, the output of op_1 is the input of op_2. It will firstly compute op_1 and then op_2 in the process of computing. For more complex models, we may need to use control flow operators to make it perform dynamically according to the input data. In this situation, IfElseOp, WhileOp and other operators are provided in PaddlePaddle. About documentation of these operators, please refer to :code:`fluid.layers` . As for this task, we use a fc operator:
-.. code-block:: python
-    y_predict = fluid.layers.fc(input=x, size=1, act=None)
-Loss Function
--------------
-Loss function is correspondent with the target solution. We can resolve the model by minimizing the loss value. The outputs of loss functions of most models are real numbers. But the loss operator in PaddlePaddle is only aimed at a single sample. When a batch is feeded, there will be many outputs from the loss operator, each of which is correspondent with the loss of a single sample. Therefore we usually append operators like ``mean`` after loss function to conduct reduction of losses. After each forward iteration, a loss value will be returned. After that, Chain derivation theorem will be performed automatically in PaddlePaddle to compute gradient value of every parameter and variable in computing model. Here we use mean square error cost: 
-.. code-block:: python
-    cost = fluid.layers.square_error_cost(input=y_predict, label=y)
-    avg_cost = fluid.layers.mean(cost)
-Optimization Method
---------------------
-After the definition of loss function, we can get loss value by forward computing and then get gradient value of parameters with chain deravation theorem. Having obtained the gradients, parameters have to be updated and the simplest algorithm is the random gradient descent algorithm: :math:`w=w - \eta \cdot g` .But common random gradient descent algorithms have some disadvantages, such as unstable convergency. To improve the training speed and effect of model, academic scholars have come up with many optimized algorithm, including :code:`Momentum` , :code:`RMSProp` , :code:`Adam` . Strategies vary from optimization algorithm to another to update parameters of model. Usually we can choose appropriate algorthm according to specific tasks and models. No matter what optimization algorithm we adopt, learning rate is usually an important super parameter to be specified and carefully adjusted by trials. Take random gradient descent algorithm as an example here:
-.. code-block:: python
-    sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
-For more optimization operators,please refer to :code:`fluid.optimizer()` .
-What to do next?
-#################
-Attention needs to be paid for **Data Layer**, **Forward Computing Logic**, **Loss function** and **Optimization Function** while you use PaddlePaddle to implement models.
-The data format, computing logic, loss function and optimization function are all different in different tasks. A rich number of examples of model are provided in PaddlePaddle. You can build your own model structure by referring to these examples. You can visit `Model Repository <https://github.com/PaddlePaddle/models/tree/develop/fluid>`_ to refer to examples in official documentation.
--- a/doc/fluid/beginners_guide/coding_practice/index_cn.rst
+++ b/doc/fluid/beginners_guide/coding_practice/index_cn.rst
-############
-编程实践
-############
-如果您已经掌握了基本概念中的内容，期望可以针对实际问题建模、搭建自己网络，本模块提供了一些 Paddle 的使用细节供您参考：
-.. toctree::
-   :maxdepth: 1
-   configure_simple_model/index_cn.rst
-   single_node.rst
-   save_load_variables.rst
--- a/doc/fluid/beginners_guide/coding_practice/index_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/index_en.rst
-############
-Coding Practice
-############
-If you have mastered the basic concepts and you expect to model and build your own network according to the actual problems, this module provides you with some details about the use of paddle for your reference:
-.. toctree::
-   :maxdepth: 1
-   configure_simple_model/index_en.rst
-   single_node_en.rst
-   test_while_training_en.rst
-   save_load_variables_en.rst
--- a/doc/fluid/beginners_guide/coding_practice/save_load_variables.rst
+++ b/doc/fluid/beginners_guide/coding_practice/save_load_variables.rst
-.. _user_guide_save_load_vars:
-################################
-模型/变量的保存、载入与增量训练
-################################
-模型变量分类
-############
-在PaddlePaddle Fluid中，所有的模型变量都用 :code:`fluid.framework.Variable()` 作为基类。
-在该基类之下，模型变量主要可以分为以下几种类别：
-1. 模型参数
-  模型参数是深度学习模型中被训练和学习的变量，在训练过程中，训练框架根据反向传播(backpropagation)算法计算出每一个模型参数当前的梯度，
-  并用优化器(optimizer)根据梯度对参数进行更新。模型的训练过程本质上可以看做是模型参数不断迭代更新的过程。
-  在PaddlePaddle Fluid中，模型参数用 :code:`fluid.framework.Parameter` 来表示，
-  这是一个 :code:`fluid.framework.Variable()` 的派生类，除了具有 :code:`fluid.framework.Variable()` 的各项性质以外，
-  :code:`fluid.framework.Parameter` 还可以配置自身的初始化方法、更新率等属性。
-2. 长期变量
-  长期变量指的是在整个训练过程中持续存在、不会因为一个迭代的结束而被销毁的变量，例如动态调节的全局学习率等。
-  在PaddlePaddle Fluid中，长期变量通过将 :code:`fluid.framework.Variable()` 的 :code:`persistable`
-  属性设置为 :code:`True` 来表示。所有的模型参数都是长期变量，但并非所有的长期变量都是模型参数。
-3. 临时变量
-  不属于上面两个类别的所有模型变量都是临时变量，这种类型的变量只在一个训练迭代中存在，在每一个迭代结束后，
-  所有的临时变量都会被销毁，然后在下一个迭代开始之前，又会先构造出新的临时变量供本轮迭代使用。
-  一般情况下模型中的大部分变量都属于这一类别，例如输入的训练数据、一个普通的layer的输出等等。
-如何保存模型变量
-################
-根据用途的不同，我们需要保存的模型变量也是不同的。例如，如果我们只是想保存模型用来进行以后的预测，
-那么只保存模型参数就够用了。但如果我们需要保存一个checkpoint（检查点，类似于存档，存有复现目前模型的必要信息）以备将来恢复训练，
-那么我们应该将各种长期变量都保存下来，甚至还需要记录一下当前的epoch和step的id。
-因为一些模型变量虽然不是参数，但对于模型的训练依然必不可少。
-save_vars、save_params、save_persistables 以及 save_inference_model的区别
-##########################################################################
-1. :code:`save_inference_model` 会根据用户配置的 :code:`feeded_var_names` 和 :code:`target_vars` 进行网络裁剪，保存下裁剪后的网络结构的 ``__model__`` 以及裁剪后网络中的长期变量
-2. :code:`save_persistables` 不会保存网络结构，会保存网络中的全部长期变量到指定位置。
-3. :code:`save_params` 不会保存网络结构，会保存网络中的全部模型参数到指定位置。
-4. :code:`save_vars` 不会保存网络结构，会根据用户指定的 :code:`fluid.framework.Parameter` 列表进行保存。
- :code:`save_persistables` 保存的网络参数是最全面的，如果是增量训练或者恢复训练， 请选择 :code:`save_persistables` 进行变量保存。
- :code:`save_inference_model` 会保存网络参数及裁剪后的模型，如果后续要做预测相关的工作， 请选择 :code:`save_inference_model` 进行变量和网络的保存。
- :code:`save_vars 和 save_params` 仅在用户了解清楚用途及特殊目的情况下使用， 一般不建议使用。
-保存模型用于对新样本的预测
-==========================
-如果我们保存模型的目的是用于对新样本的预测，那么只保存模型参数就足够了。我们可以使用
-:code:`fluid.io.save_params()` 接口来进行模型参数的保存。
-例如：
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    param_path = "./my_paddle_model"
-    prog = fluid.default_main_program()
-    fluid.io.save_params(executor=exe, dirname=param_path, main_program=None)
-上面的例子中，通过调用 :code:`fluid.io.save_params` 函数，PaddlePaddle Fluid会对默认
-:code:`fluid.Program` 也就是 :code:`prog` 中的所有模型变量进行扫描，
-筛选出其中所有的模型参数，并将这些模型参数保存到指定的 :code:`param_path` 之中。
-如何载入模型变量
-################
-与模型变量的保存相对应，我们提供了两套API来分别载入模型的参数和载入模型的长期变量，分别为保存、加载模型参数的 ``save_params()`` 、 ``load_params()`` 和
-保存、加载长期变量的 ``save_persistables`` 、 ``load_persistables`` 。
-载入模型用于对新样本的预测
-==========================
-对于通过 :code:`fluid.io.save_params` 保存的模型，可以使用 :code:`fluid.io.load_params`
-来进行载入。
-例如：
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    param_path = "./my_paddle_model"
-    prog = fluid.default_main_program()
-    fluid.io.load_params(executor=exe, dirname=param_path,
-                         main_program=prog)
-上面的例子中，通过调用 :code:`fluid.io.load_params` 函数，PaddlePaddle Fluid会对
-:code:`prog` 中的所有模型变量进行扫描，筛选出其中所有的模型参数，
-并尝试从 :code:`param_path` 之中读取加载它们。
-需要格外注意的是，这里的 :code:`prog` 必须和调用 :code:`fluid.io.save_params`
-时所用的 :code:`prog` 中的前向部分完全一致，且不能包含任何参数更新的操作。如果两者存在不一致，
-那么可能会导致一些变量未被正确加载；如果错误地包含了参数更新操作，那可能会导致正常预测过程中参数被更改。
-这两个 :code:`fluid.Program` 之间的关系类似于训练 :code:`fluid.Program`
-和测试 :code:`fluid.Program` 之间的关系，详见： :ref:`user_guide_test_while_training`。
-另外，需特别注意运行 :code:`fluid.default_startup_program()` 必须在调用 :code:`fluid.io.load_params`
-之前。如果在之后运行，可能会覆盖已加载的模型参数导致错误。
-通过numpy数组设置模型参数值
-===========================
-用户可以灵活地使用numpy数组设置模型参数的值，具体示例如下：
-.. code-block:: python
-    import paddle.fluid as fluid
-    import numpy as np
-    main_prog = fluid.Program()
-    startup_prog = fluid.Program()
-    with fluid.program_guard(main_prog, startup_prog):
-        data = fluid.layers.data(name="img", shape=[64, 784], append_batch_size=False)
-        w = fluid.layers.create_parameter(shape=[784, 200], dtype='float32', name='fc_w')
-        b = fluid.layers.create_parameter(shape=[200], dtype='float32', name='fc_b')
-        hidden_w = fluid.layers.matmul(x=data, y=w)
-        hidden_b = fluid.layers.elementwise_add(hidden_w, b)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    exe.run(startup_prog)
-    for block in main_prog.blocks:
-        for param in block.all_parameters():
-            pd_var = fluid.global_scope().find_var(param.name)
-            pd_param = pd_var.get_tensor()
-            print("load: {}, shape: {}".format(param.name, param.shape))
-            print("Before setting the numpy array value: {}".format(np.array(pd_param).ravel()[:5]))
-            pd_param.set(np.ones(param.shape), place)
-            print("After setting the numpy array value: {}".format(np.array(pd_param).ravel()[:5]))
-    # 输出结果：
-    # load: fc_w, shape: (784, 200)
-    # Before setting the numpy array value: [ 0.00121664  0.00700346 -0.05220041 -0.05879825  0.05155897]
-    # After setting the numpy array value: [1. 1. 1. 1. 1.]
-    # load: fc_b, shape: (200,)
-    # Before setting the numpy array value: [-0.098886   -0.00530401 -0.05821943 -0.01038218  0.00760134]
-    # After setting the numpy array value: [1. 1. 1. 1. 1.]
-预测模型的保存和加载
-##############################
-预测引擎提供了存储预测模型 :code:`fluid.io.save_inference_model` 和加载预测模型 :code:`fluid.io.load_inference_model` 两个接口。
- :code:`fluid.io.save_inference_model`：请参考  :ref:`api_guide_inference`。
- :code:`fluid.io.load_inference_model`：请参考  :ref:`api_guide_inference`。
-增量训练
-############
-增量训练指一个学习系统能不断地从新样本中学习新的知识，并能保存大部分以前已经学习到的知识。因此增量学习涉及到两点：在上一次训练结束的时候保存需要的长期变量， 在下一次训练开始的时候加载上一次保存的这些长期变量。 因此增量训练涉及到如下几个API:
-:code:`fluid.io.save_persistables`、:code:`fluid.io.load_persistables` 。
-单机增量训练
-==========================
-单机的增量训练的一般步骤如下：
-1. 在训练的最后调用 :code:`fluid.io.save_persistables` 保存持久性参数到指定的位置。
-2. 在训练的startup_program通过执行器 :code:`Executor` 执行成功之后调用 :code:`fluid.io.load_persistables` 加载之前保存的持久性参数。
-3. 通过执行器 :code:`Executor` 或者 :code:`ParallelExecutor` 继续训练。
-例如：
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    prog = fluid.default_main_program()
-    fluid.io.save_persistables(exe, path, prog)
-上面的例子中，通过调用 :code:`fluid.io.save_persistables` 函数，PaddlePaddle Fluid会从默认 :code:`fluid.Program` 也就是 :code:`prog` 的所有模型变量中找出长期变量，并将他们保存到指定的 :code:`path` 目录下。
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    startup_prog = fluid.default_startup_program()
-    exe.run(startup_prog)
-    main_prog = fluid.default_main_program()
-    fluid.io.load_persistables(exe, path, main_prog)
-    exe.run(main_prog)
-上面的例子中，通过调用 :code:`fluid.io.load_persistables` 函数，PaddlePaddle Fluid会从默认
-:code:`fluid.Program` 也就是 :code:`prog` 的所有模型变量中找出长期变量，从指定的 :code:`path` 目录中将它们一一加载， 然后再继续进行训练。
-多机增量（不带分布式大规模稀疏矩阵）训练的一般步骤为
-==========================
-多机增量训练和单机增量训练有若干不同点：
-1. 在训练的最后调用 :code:`fluid.io.save_persistables` 保存长期变量时，不必要所有的trainer都调用这个方法来保存，一般0号trainer来保存即可。
-2. 多机增量训练的参数加载在PServer端，trainer端不用加载参数。在PServer全部启动后，trainer会从PServer端同步参数。
-3. 在确认需要使用增量的情况下， 多机在调用 :code:`fluid.DistributeTranspiler.transpile` 时需要指定 ``current_endpoint`` 参数。
-多机增量（不带分布式大规模稀疏矩阵）训练的一般步骤为：
-1. 0号trainer在训练的最后调用 :code:`fluid.io.save_persistables` 保存持久性参数到指定的 :code:`path` 下。
-2. 通过HDFS等方式将0号trainer保存下来的所有的参数共享给所有的PServer(每个PServer都需要有完整的参数)。
-3. PServer在训练的startup_program通过执行器（:code:`Executor`）执行成功之后调用 :code:`fluid.io.load_persistables` 加载0号trainer保存的持久性参数。
-4. PServer通过执行器 :code:`Executor` 继续启动PServer_program.
-5. 所有的训练节点trainer通过执行器 :code:`Executor` 或者 :code:`ParallelExecutor` 正常训练。
-对于训练过程中待保存参数的trainer， 例如：
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    trainer_id = 0
-    if trainer_id == 0:
-        prog = fluid.default_main_program()
-        fluid.io.save_persistables(exe, path, prog)
-.. code-block:: bash
-    hadoop fs -mkdir /remote/$path
-    hadoop fs -put $path /remote/$path
-上面的例子中，0号trainer通过调用 :code:`fluid.io.save_persistables` 函数，PaddlePaddle Fluid会从默认
-:code:`fluid.Program` 也就是 :code:`prog` 的所有模型变量中找出长期变量，并将他们保存到指定的 :code:`path` 目录下。然后通过调用第三方的文件系统（如HDFS）将存储的模型进行上传到所有PServer都可访问的位置。
-对于训练过程中待载入参数的PServer， 例如：
-.. code-block:: bash
-    hadoop fs -get /remote/$path $path
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    pserver_endpoints = "127.0.0.1:1001,127.0.0.1:1002"
-    trainers = 4
-    training_role == "PSERVER"
-    config = fluid.DistributeTranspilerConfig()
-    t = fluid.DistributeTranspiler(config=config)
-    t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers, sync_mode=True, current_endpoint=current_endpoint)
-    if training_role == "PSERVER":
-        current_endpoint = "127.0.0.1:1001"
-        pserver_prog = t.get_pserver_program(current_endpoint)
-        pserver_startup = t.get_startup_program(current_endpoint, pserver_prog)
-        exe.run(pserver_startup)
-        fluid.io.load_persistables(exe, path, pserver_prog)
-        exe.run(pserver_prog)
-    if training_role == "TRAINER":
-        main_program = t.get_trainer_program()
-                exe.run(main_program)
-上面的例子中，每个PServer通过调用HDFS的命令获取到0号trainer保存的参数，通过配置获取到PServer的 :code:`fluid.Program` ，PaddlePaddle Fluid会从此
-:code:`fluid.Program` 也就是 :code:`pserver_startup` 的所有模型变量中找出长期变量，并通过指定的 :code:`path` 目录下一一加载。
--- a/doc/fluid/beginners_guide/coding_practice/save_load_variables_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/save_load_variables_en.rst
-.. _user_guide_save_load_vars_en:
-######################################################
-Save, Load Models or Variables & Incremental Learning
-######################################################
-Model variable classification
-##############################
-In PaddlePaddle Fluid, all model variables are represented by :code:`fluid.Variable()` as the base class. Under this base class, model variables can be divided into the following categories:
-1. Model parameter
-  The model parameters are the variables trained and learned in the deep learning model. During the training process, the training framework calculates the current gradient of each model parameter according to the back propagation algorithm, and updates the parameters according to their gradients by the optimizer. The essence of the training process of a model can be seen as the process of continuously iterative updating of model parameters. In PaddlePaddle Fluid, the model parameters are represented by :code:`fluid.framework.Parameter` , which is a derived class of :code:`fluid.Variable()` . Besides various properties of :code:`fluid.Variable()` , :code:`fluid.framework.Parameter` can also be configured with its own initialization methods, update rate and other properties.
-2. Persistable variable
-  Persistable variables refer to variables that persist throughout the training process and are not destroyed by the end of an iteration, such as the global learning rate which is dynamically adjusted. In PaddlePaddle Fluid, persistable variables are represented by setting the :code:`persistable` property of :code:`fluid.Variable()` to :code:`True`. All model parameters are persistable variables, but not all persistable variables are model parameters.
-3. Temporary variables
-  All model variables that do not belong to the above two categories are temporary variables. This type of variable exists only in one training iteration. After each iteration, all temporary variables will be destroyed, and before the next iteration, A new set of temporary variables will be constructed first for this iteration. In general, most of the variables in the model belong to this category, such as the input training data, the output of a normal layer, and so on.
-How to save model variables
-############################
-The model variables we need to save are different depending on the application. For example, if we just want to save the model for future predictions, just saving the model parameters will be enough. But if we need to save a checkpoint for future recovery of current training, then we should save all the persistable variables, and even record the current epoch and step id. It is because even though some model variables are not parameters, they are still essential for model training.
-Difference between save_vars、save_params、save_persistables and save_inference_model
-##########################################################################
-1. :code:`save_inference_model` will prune the inference model based on :code:`feeded_var_names` and :code:`target_vars` , this method will save the ``__model__`` file of the pruned program and the persistable variables in the program.
-2. :code:`save_persistables` this method will not save model, it will save all the persistable variables in the program.
-3. :code:`save_params` this method will not save model, it will save all the parameters in the program.
-4. :code:`save_vars` this method will not save model, it will save the given parameter by user.
- :code:`save_persistables` this method is useful for increment training or checkpoint training, it can save persistable variables in program comprehensively, such as parameter variables, optimizer variables, if you need increment training or checkpoint training, please choose this one. 
- :code:`save_inference_model` this method  is useful for inference, it will save persistable variables and pruned program, if you need program and variables for follow-up  high performance inference, please choose this one.
- :code:`save_vars 和 save_params` there methods are only needed in particular cases, we suppose you already know the purpose of there APIs, there are not recommended for use normally.
-Save the model to make prediction for new samples
-===================================================
-If we save the model to make prediction for new samples, just saving the model parameters will be sufficient. We can use the :code:`fluid.io.save_params()` interface to save model parameters.
-For example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    param_path = "./my_paddle_model"
-    prog = fluid.default_main_program()
-    fluid.io.save_params(executor=exe, dirname=param_path, main_program=None)
-In the example above, by calling the :code:`fluid.io.save_params` function, PaddlePaddle Fluid scans all model variables in the default :code:`fluid.Program` , i.e. :code:`prog` and picks out all model parameters. All these model parameters are saved to the specified :code:`param_path` .
-How to load model variables
-#############################
-Corresponding to saving of model variables, we provide two sets of APIs to load the model parameters and the persistable variables of model.
-Load model to make predictions for new samples
-================================================
-For models saved with :code:`fluid.io.save_params` , you can load them with :code:`fluid.io.load_params`.
-For example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    param_path = "./my_paddle_model"
-    prog = fluid.default_main_program()
-    fluid.io.load_params(executor=exe, dirname=param_path,
-                         main_program=prog)
-In the above example, by calling the :code:`fluid.io.load_params` function, PaddlePaddle Fluid will scan all the model variables in :code:`prog`, filter out all the model parameters, and try to load them from :code:`param_path` .
-It is important to note that the :code:`prog` here must be exactly the same as the forward part of the :code:`prog` used when calling :code:`fluid.io.save_params` and cannot contain any operations of parameter updates. If there is an inconsistency between the two, it may cause some variables not to be loaded correctly; if the parameter update operation is incorrectly included, it may cause the parameters to be changed during normal prediction. The relationship between these two :code:`fluid.Program` is similar to the relationship between training :code:`fluid.Program` and test :code:`fluid.Program`, see: :ref:`user_guide_test_while_training_en` .
-In addition, special care must be taken that :code:`fluid.default_startup_program()` **must** be run before calling :code:`fluid.io.load_params` . If you run it later, it may overwrite the loaded model parameters and cause an error.
-Prediction of the used models and parameters saving
-#######################################################
-The inference engine provides two interfaces : prediction model saving :code:`fluid.io.save_inference_model` and the prediction model loading :code:`fluid.io.load_inference_model`.
- :code:`fluid.io.save_inference_model`: Please refer to  :ref:`api_guide_inference` .
- :code:`fluid.io.load_inference_model`: Please refer to  :ref:`api_guide_inference` .
-Incremental training
-#####################
-Incremental training means that a learning system can continuously learn new knowledge from new samples and preserve most of the knowledge that has been learned before. Therefore, incremental learning involves two points: saving the parameters that need to be persisted at the end of the last training, and loading the last saved persistent parameters at the beginning of the next training. Therefore incremental training involves the following APIs:
-:code:`fluid.io.save_persistables`, :code:`fluid.io.load_persistables` .
-Single-node incremental training
-=================================
-The general steps of incremental training on a single unit are as follows:
-1. At the end of the training, call :code:`fluid.io.save_persistables` to save the persistable parameter to the specified location.
-2. After the training startup_program is executed successfully by the executor :code:`Executor`, call :code:`fluid.io.load_persistables` to load the previously saved persistable parameters.
-3. Continue training with the executor :code:`Executor` or :code:`ParallelExecutor`.
-Example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    prog = fluid.default_main_program()
-    fluid.io.save_persistables(exe, path, prog)
-In the above example, by calling the :code:`fluid.io.save_persistables` function, PaddlePaddle Fluid will find all persistable variables from all model variables in the default :code:`fluid.Program`, e.t. :code:`prog` , and save them to the specified :code:`path` directory.
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    startup_prog = fluid.default_startup_program()
-    exe.run(startup_prog)
-    main_prog = fluid.default_main_program()
-    fluid.io.load_persistables(exe, path, main_prog)
-    exe.run(main_prog)
-In the above example, by calling the :code:`fluid.io.load_persistables` function, PaddlePaddle Fluid will find persistable variables from all model variables in the default :code:`fluid.Program` , e.t. :code:`prog` . and load them one by one from the specified :code:`path` directory to continue training.
-The general steps for multi-node incremental training (without distributed large-scale sparse matrices)
-=========================================================================================================
-There are several differences between multi-node incremental training and single-node incremental training:
-1. At the end of the training, when :code:`fluid.io.save_persistables` is called to save the persistence parameters, it is not necessary for all trainers to call this method, usually it is called on the 0th trainer.
-2. The parameters of multi-node incremental training are loaded on the PServer side, and the trainer side does not need to load parameters. After the PServers are fully started, the trainer will synchronize the parameters from the PServer.
-The general steps for multi-node incremental training (do not enable distributed large-scale sparse matrices) are:
-1. At the end of the training, Trainer 0 will call :code:`fluid.io.save_persistables` to save the persistable parameters to the specified :code:`path`.
-2. Share all the parameters saved by trainer 0 to all PServers through HDFS or other methods. (each PServer needs to have complete parameters).
-3. After the training startup_program is successfully executed by the executor ( :code:`Executor` ), the PServer calls :code:`fluid.io.load_persistables` to load the persistable parameters saved by the 0th trainer.
-4. The PServer continues to start PServer_program via the executor :code:`Executor`.
-5. All training node trainers conduct training process normally through the executor :code:`Executor` or :code:`ParallelExecutor` .
-For trainers whose parameters are to be saved during training, for example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-    trainer_id = 0
-    if trainer_id == 0:
-        prog = fluid.default_main_program()
-        fluid.io.save_persistables(exe, path, prog)
-.. code-block:: bash
-    hadoop fs -mkdir /remote/$path
-    hadoop fs -put $path /remote/$path
-In the above example, the 0 trainer calls the :code:`fluid.io.save_persistables` function. By calling this function,  PaddlePaddle Fluid will find all persistable variables in all model variables from default :code:`fluid.Program` , e.t.  :code:`prog` , and save them to the specified :code:`path` directory. The stored model is then uploaded to a location accessible for all PServers by invoking a third-party file system (such as HDFS).
-For the PServer to be loaded with parameters during training, for example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    exe = fluid.Executor(fluid.CPUPlace())
-    path = "./models"
-	pserver_endpoints = "127.0.0.1:1001,127.0.0.1:1002"
-	trainers = 4
-	Training_role == "PSERVER"
-	config = fluid.DistributeTranspilerConfig()
-	t = fluid.DistributeTranspiler(config=config)
-	t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers, sync_mode=True)
-	if training_role == "PSERVER":
-		current_endpoint = "127.0.0.1:1001"
-		pserver_prog = t.get_pserver_program(current_endpoint)
-		pserver_startup = t.get_startup_program(current_endpoint, pserver_prog)
-		exe.run(pserver_startup)
-		fluid.io.load_persistables(exe, path, pserver_startup)
-		exe.run(pserver_prog)
-	if training_role == "TRAINER":
-		main_program = t.get_trainer_program()
-				exe.run(main_program)
-In the above example, each PServer obtains the parameters saved by trainer 0 by calling the HDFS command, and obtains the PServer's :code:`fluid.Program` by configuration. PaddlePaddle Fluid will find all persistable variables in all model variables from this :code:`fluid.Program` , e.t. :code:`pserver_startup` , and load them from the specified :code:`path` directory.
--- a/doc/fluid/beginners_guide/coding_practice/single_node.rst
+++ b/doc/fluid/beginners_guide/coding_practice/single_node.rst
-########
-单机训练
-########
-准备工作
-########
-要进行PaddlePaddle Fluid单机训练，需要先 :ref:`user_guide_prepare_data` 和
-:ref:`user_guide_configure_simple_model` 。当\
-:ref:`user_guide_configure_simple_model` 完毕后，可以得到两个\
-:code:`fluid.Program`， :code:`startup_program` 和 :code:`main_program`。
-默认情况下，可以使用 :code:`fluid.default_startup_program()` 与\ :code:`fluid.default_main_program()` 获得全局的 :code:`fluid.Program`。
-例如:
-.. code-block:: python
-   import paddle.fluid as fluid
-   image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-   label = fluid.data(name="label", shape=[None, 1], dtype='int64')
-   hidden = fluid.layers.fc(input=image, size=100, act='relu')
-   prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
-   loss = fluid.layers.cross_entropy(input=prediction, label=label)
-   loss = fluid.layers.mean(loss)
-   sgd = fluid.optimizer.SGD(learning_rate=0.001)
-   sgd.minimize(loss)
-   # Here the fluid.default_startup_program() and fluid.default_main_program()
-   # has been constructed.
-在上述模型配置执行完毕后， :code:`fluid.default_startup_program()` 与\
-:code:`fluid.default_main_program()` 配置完毕了。
-初始化参数
-##########
-参数随机初始化
-==============
-用户配置完模型后，参数初始化操作会被写入到\
-:code:`fluid.default_startup_program()` 中。使用 :code:`fluid.Executor()` 运行
-这一程序，初始化之后的参数默认被放在全局scope中，即 :code:`fluid.global_scope()` 。例如:
-.. code-block:: python
-   exe = fluid.Executor(fluid.CUDAPlace(0))
-   exe.run(program=fluid.default_startup_program())
-载入预定义参数
-==============
-在神经网络训练过程中，经常会需要载入预定义模型，进而继续进行训练。\
-如何载入预定义参数，请参考 :ref:`user_guide_save_load_vars`。
-单卡训练
-########
-执行单卡训练可以使用 :code:`fluid.Executor()` 中的 :code:`run()` 方法，运行训练\
-:code:`fluid.Program` 即可。在运行的时候，用户可以通过 :code:`run(feed=...)`\
-参数传入数据；用户可以通过 :code:`run(fetch=...)` 获取输出数据。例如:\
-.. code-block:: python
-    import paddle.fluid as fluid
-    import numpy
-    train_program = fluid.Program()
-    startup_program = fluid.Program()
-    with fluid.program_guard(train_program, startup_program):
-        data = fluid.data(name='X', shape=[None, 1], dtype='float32')
-        hidden = fluid.layers.fc(input=data, size=10)
-        loss = fluid.layers.mean(hidden)
-        sgd = fluid.optimizer.SGD(learning_rate=0.001)
-        sgd.minimize(loss)
-    use_cuda = True
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    # Run the startup program once and only once.
-    # Not need to optimize/compile the startup program.
-    startup_program.random_seed=1
-    exe.run(startup_program)
-    # Run the main program directly without compile.
-    x = numpy.random.random(size=(10, 1)).astype('float32')
-    loss_data, = exe.run(train_program,
-                         feed={"X": x},
-                         fetch_list=[loss.name])
-    # Or use CompiledProgram:
-    compiled_prog = fluid.CompiledProgram(train_program)
-    loss_data, = exe.run(compiled_prog,
-                 feed={"X": x},
-                 fetch_list=[loss.name])
-多卡训练
-#######################
-在多卡训练中，你可以使用 :code:`fluid.CompiledProgram` 来编译 :code:`fluid.Program` ，然后调用 :code:`with_data_parallel` 。例如：
-.. code-block:: python
-    # NOTE: If you use CPU to run the program, you need
-    # to specify the CPU_NUM, otherwise, fluid will use
-    # all the number of the logic cores as the CPU_NUM,
-    # in that case, the batch size of the input should be
-    # greater than CPU_NUM, if not, the process will be
-    # failed by an exception.
-    if not use_cuda:
-        os.environ['CPU_NUM'] = str(2)
-    compiled_prog = fluid.CompiledProgram(
-        train_program).with_data_parallel(
-        loss_name=loss.name)
-    loss_data, = exe.run(compiled_prog,
-                         feed={"X": x},
-                         fetch_list=[loss.name])
-注释：
-1. :ref:`cn_api_fluid_CompiledProgram` 会将传入的 :code:`fluid.Program` 转为计算图，即Graph，因为 :code:`compiled_prog` 与传入的 :code:`train_program` 是完全不同的对象，目前还不能够对 :code:`compiled_prog` 进行保存。
-2. 多卡训练也可以使用 :ref:`cn_api_fluid_ParallelExecutor` ，但是现在推荐使用 :ref:`cn_api_fluid_CompiledProgram` .
-3. 如果 :code:`exe` 是用CUDAPlace来初始化的，模型会在GPU中运行。在显卡训练模式中，所有的显卡都将被占用。用户可以配置 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 以更改被占用的显卡。
-4. 如果 :code:`exe` 是用CPUPlace来初始化的，模型会在CPU中运行。在这种情况下，多线程用于运行模型，同时线程的数目和逻辑核的数目相等。用户可以配置 ``CPU_NUM`` 以更改使用中的线程数目。
-进阶使用
-###############
-.. toctree::
-   :maxdepth: 2
-   test_while_training.rst
--- a/doc/fluid/beginners_guide/coding_practice/single_node_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/single_node_en.rst
-#####################
-Single-node training
-#####################
-Preparation
-############
-To perform single-node training in PaddlePaddle Fluid, you need to read :ref:`user_guide_prepare_data_en` and :ref:`user_guide_configure_simple_model_en` . When you have finished reading :ref:`user_guide_configure_simple_model_en` , you can get two :code:`fluid.Program`, namely :code:`startup_program` and :code:`main_program` . By default, you can use :code:`fluid.default_startup_program()` and :code:`fluid.default_main_program()` to get global :code:`fluid.Program` .
-For example:
-.. code-block:: python
-   import paddle.fluid as fluid
-   image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-   label = fluid.data(name="label", shape=[None, 1], dtype='int64')
-   hidden = fluid.layers.fc(input=image, size=100, act='relu')
-   prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
-   loss = fluid.layers.cross_entropy(input=prediction, label=label)
-   loss = fluid.layers.mean(loss)
-   sgd = fluid.optimizer.SGD(learning_rate=0.001)
-   sgd.minimize(loss)
-   # Here the fluid.default_startup_program() and fluid.default_main_program()
-   # has been constructed.
-After the configuration of model, the configurations of :code:`fluid.default_startup_program()` and :code:`fluid.default_main_program()` have been finished.
-Initialize Parameters
-#######################
-Random Initialization of Parameters
-====================================
-After the configuration of model,the initialization of parameters will be written into :code:`fluid.default_startup_program()` . By running this program in :code:`fluid.Executor()` , the random initialization of parameters will be finished in global scope, i.e. :code:`fluid.global_scope()` .For example:
-.. code-block:: python
-   exe = fluid.Executor(fluid.CUDAPlace(0))
-   exe.run(program=fluid.default_startup_program())
-Load Predefined Parameters
-===========================
-In the neural network training, predefined models are usually loaded to continue training. For how to load predefined parameters, please refer to :ref:`user_guide_save_load_vars_en`.
-Single-card Training
-#####################
-Single-card training can be performed through calling :code:`run()` of :code:`fluid.Executor()` to run training :code:`fluid.Program` .
-In the runtime, users can feed data with :code:`run(feed=...)` and get output data with :code:`run(fetch=...)` . For example:
-.. code-block:: python
-    import paddle.fluid as fluid
-    import numpy
-    train_program = fluid.Program()
-    startup_program = fluid.Program()
-    with fluid.program_guard(train_program, startup_program):
-        data = fluid.data(name='X', shape=[None, 1], dtype='float32')
-        hidden = fluid.layers.fc(input=data, size=10)
-        loss = fluid.layers.mean(hidden)
-        sgd = fluid.optimizer.SGD(learning_rate=0.001)
-        sgd.minimize(loss)
-    use_cuda = True
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    # Run the startup program once and only once.
-    # Not need to optimize/compile the startup program.
-    startup_program.random_seed=1
-    exe.run(startup_program)
-    # Run the main program directly without compile.
-    x = numpy.random.random(size=(10, 1)).astype('float32')
-    loss_data, = exe.run(train_program,
-                         feed={"X": x},
-                         fetch_list=[loss.name])
-    # Or use CompiledProgram:
-    compiled_prog = fluid.CompiledProgram(train_program)
-    loss_data, = exe.run(compiled_prog,
-                 feed={"X": x},
-                 fetch_list=[loss.name])
-Multi-card Training
-#######################
-In multi-card training, you can use :code:`fluid.CompiledProgram` to compile the :code:`fluid.Program`, and then call :code:`with_data_parallel`. For example:
-.. code-block:: python
-    # NOTE: If you use CPU to run the program, you need
-    # to specify the CPU_NUM, otherwise, fluid will use
-    # all the number of the logic core as the CPU_NUM,
-    # in that case, the batch size of the input should be
-    # greater than CPU_NUM, if not, the process will be
-    # failed by an exception.
-    if not use_cuda:
-        os.environ['CPU_NUM'] = str(2)
-    compiled_prog = fluid.CompiledProgram(
-        train_program).with_data_parallel(
-        loss_name=loss.name)
-    loss_data, = exe.run(compiled_prog,
-                         feed={"X": x},
-                         fetch_list=[loss.name])
-Notes:
-1. :ref:`api_fluid_CompiledProgram` will convert the input Program into a computational graph, and :code:`compiled_prog` is a completely different object from the incoming :code:`train_program`. At present, :code:`compiled_prog` can not be saved.
-2. Multi-card training can also be used: ref:`api_fluid_ParallelExecutor` , but now it is recommended to use: :ref:`api_fluid_CompiledProgram`.
-3. If :code:`exe` is initialized with CUDAPlace, the model will be run in GPU. In the mode of graphics card training, all graphics card will be occupied. Users can configure `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ to change graphics cards that are being used. 
-4. If :code:`exe` is initialized with CPUPlace, the model will be run in CPU. In this situation, the multi-threads are used to run the model, and the number of threads is equal to the number of logic cores. Users can configure `CPU_NUM`  to change the number of threads that are being used. 
-Advanced Usage
-###############
-.. toctree::
-   :maxdepth: 2
-   test_while_training_en.rst
--- a/doc/fluid/beginners_guide/coding_practice/test_while_training.rst
+++ b/doc/fluid/beginners_guide/coding_practice/test_while_training.rst
-.. _user_guide_test_while_training:
-##################
-训练过程中评测模型
-##################
-模型的测试评价与训练的 :code:`fluid.Program` 不同。在测试评价中:
-1. 测试评价不进行反向传播，不优化更新参数。
-2. 测试评价执行的操作可以不同。
-   * 例如 BatchNorm 操作，在训练和测试时执行不同的算法。
-   * 测试评价模型与训练模型可以是完全不同的模型。
-生成测试 :code:`fluid.Program`
-#################################
-通过克隆训练 :code:`fluid.Program` 生成测试 :code:`fluid.Program`
-=======================================================================
-用 :code:`Program.clone()` 方法可以复制出新的 :code:`fluid.Program` 。 通过设置
-:code:`Program.clone(for_test=True)` 复制含有用于测试的操作 :code:`fluid.Program` 。简单的使用方法如下:
-.. code-block:: python
-   import paddle.fluid as fluid
-   image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-   label = fluid.data(name="label", shape=[None, 1], dtype="int64")
-   prediction = fluid.layers.fc(
-     input=fluid.layers.fc(input=image, size=100, act='relu'),
-     size=10,
-     act='softmax'
-   )
-   loss = fluid.layers.mean(fluid.layers.cross_entropy(input=prediction, label=label))
-   acc = fluid.layers.accuracy(input=prediction, label=label)
-   test_program = fluid.default_main_program().clone(for_test=True)
-   adam = fluid.optimizer.Adam(learning_rate=0.001)
-   adam.minimize(loss)
-在使用 :code:`Optimizer` 之前，将 :code:`fluid.default_main_program()` 复制\
-成一个 :code:`test_program` 。之后使用测试数据运行 :code:`test_program`,\
-就可以做到运行测试程序，而不影响训练结果。
-分别配置训练 :code:`fluid.Program` 和测试 :code:`fluid.Program`
-=====================================================================
-如果训练程序和测试程序相差较大时，用户也可以通过完全定义两个不同的
-:code:`fluid.Program`，分别进行训练和测试。在PaddlePaddle Fluid中，\
-所有的参数都有名字。如果两个不同的操作，甚至两个不同的网络使用了同样名字的参数，\
-那么他们的值和内存空间都是共享的。
-PaddlePaddle Fluid中使用 :code:`fluid.unique_name` 包来随机初始化用户未定义的\
-参数名称。通过 :code:`fluid.unique_name.guard` 可以确保多次调用某函数\
-参数初始化的名称一致。
-例如:
-.. code-block:: python
-   import paddle.fluid as fluid
-   def network(is_test):
-       image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-       label = fluid.data(name="label", shape=[None, 1], dtype="int64")
-       hidden = fluid.layers.fc(input=image, size=100, act="relu")
-       hidden = fluid.layers.batch_norm(input=hidden, is_test=is_test)
-       ...
-       return loss
-   with fluid.unique_name.guard():
-       train_loss = network(is_test=False)
-       sgd = fluid.optimizer.SGD(0.001)
-       sgd.minimize(train_loss)
-   test_program = fluid.Program()
-   with fluid.unique_name.guard():
-       with fluid.program_guard(test_program, fluid.Program()):
-           test_loss = network(is_test=True)
-   # fluid.default_main_program() is the train program
-   # fluid.test_program is the test program
-执行测试 :code:`fluid.Program`
-#################################
-使用 :code:`Executor` 执行测试 :code:`fluid.Program`
-=======================================================
-用户可以使用 :code:`Executor.run(program=...)` 来执行测试
-:code:`fluid.Program`。
-例如
-.. code-block:: python
-   exe = fluid.Executor(fluid.CPUPlace())
-   test_acc = exe.run(program=test_program, feed=test_data_batch, fetch_list=[acc])
-   print 'Test accuracy is ', test_acc
-使用 :code:`ParallelExecutor` 执行测试 :code:`fluid.Program`
-===============================================================
-用户可以使用训练用的 :code:`ParallelExecutor` 与测试 :code:`fluid.Program`
-一起，新建一个测试的 :code:`ParallelExecutor` ；再使用测试
-:code:`ParallelExecutor.run` 来执行测试。
-例如:
-.. code-block:: python
-   train_exec = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
-   test_exec = fluid.ParallelExecutor(use_cuda=True, share_vars_from=train_exec,
-                                      main_program=test_program)
-   test_acc = test_exec.run(fetch_list=[acc], ...)
--- a/doc/fluid/beginners_guide/coding_practice/test_while_training_en.rst
+++ b/doc/fluid/beginners_guide/coding_practice/test_while_training_en.rst
-.. _user_guide_test_while_training_en:
-##############################
-Evaluate model while training
-##############################
-:code:`fluid.Program` for model test and evaluation is different from the one for training. In the test and evalution phase:
-1. There is no back propagation and no process of optimizing and updating parameters in evaluation and test.
-2. Operations in model evaluation can be different.
-   * Take the operator BatchNorm for example, algorithms are different in train and test.
-   * Evaluation model and training model can be totally different.
-Generate :code:`fluid.Program` for test
-#######################################
-Generate test :code:`fluid.Program` by cloning training :code:`fluid.Program` 
-============================================================================
-:code:`Program.clone()` can generate a copied new :code:`fluid.Program` . You can generate a copy of Program with operations applied for test by setting :code:`Program.clone(for_test=True)` . Simple usage is as follows:
-.. code-block:: python
-   import paddle.fluid as fluid
-   image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-   label = fluid.data(name="label", shape=[None, 1], dtype="int64")
-   prediction = fluid.layers.fc(
-     input=fluid.layers.fc(input=image, size=100, act='relu'),
-     size=10,
-     act='softmax'
-   )
-   loss = fluid.layers.mean(fluid.layers.cross_entropy(input=prediction, label=label))
-   acc = fluid.layers.accuracy(input=prediction, label=label)
-   test_program = fluid.default_main_program().clone(for_test=True)
-   adam = fluid.optimizer.Adam(learning_rate=0.001)
-   adam.minimize(loss)
-Before using :code:`Optimizer` , please copy :code:`fluid.default_main_program()` into a :code:`test_program` . Then you can pass test data to :code:`test_program` so that you can run test program without influencing training result.
-Configure training :code:`fluid.Program` and test :code:`fluid.Program` individually
-=====================================================================================
-If the training program is largely different from test program, you can define two totally different :code:`fluid.Program` , and perform training and test individually. In PaddlePaddle Fluid, all parameters are named. If two different operations or even two different networks use parameters with the same name, the value and memory space of these parameters are shared.
-Fluid adopts :code:`fluid.unique_name` package to randomly initialize the names of unnamed parameters. :code:`fluid.unique_name.guard` can keep the initialized names consistent across multiple times of calling some function.
-For example:
-.. code-block:: python
-   import paddle.fluid as fluid
-   def network(is_test):
-       image = fluid.data(name="image", shape=[None, 784], dtype='float32')
-       label = fluid.data(name="label", shape=[None, 1], dtype="int64")
-       hidden = fluid.layers.fc(input=image, size=100, act="relu")
-       hidden = fluid.layers.batch_norm(input=hidden, is_test=is_test)
-       ...
-       return loss
-   with fluid.unique_name.guard():
-       train_loss = network(is_test=False)
-       sgd = fluid.optimizer.SGD(0.001)
-       sgd.minimize(train_loss)
-   test_program = fluid.Program()
-   with fluid.unique_name.guard():
-       with fluid.program_guard(test_program, fluid.Program()):
-           test_loss = network(is_test=True)
-   # fluid.default_main_program() is the train program
-   # fluid.test_program is the test program
-Perform test :code:`fluid.Program`
-###################################
-Run test :code:`fluid.Program` with :code:`Executor` 
-=======================================================
-You can run test :code:`fluid.Program` with :code:`Executor.run(program=...)` .
-For example:
-.. code-block:: python
-   exe = fluid.Executor(fluid.CPUPlace())
-   test_acc = exe.run(program=test_program, feed=test_data_batch, fetch_list=[acc])
-   print 'Test accuracy is ', test_acc
-Run test :code:`fluid.Program` with :code:`ParallelExecutor` 
-=====================================================================
-You can use :code:`ParallelExecutor` for training and :code:`fluid.Program` for test to create a new test :code:`ParallelExecutor` ; then use test :code:`ParallelExecutor.run` to run test process.
-For example:
-.. code-block:: python
-   train_exec = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
-   test_exec = fluid.ParallelExecutor(use_cuda=True, share_vars_from=train_exec,
-                                      main_program=test_program)
-   test_acc = test_exec.run(fetch_list=[acc], ...)
--- a/doc/fluid/beginners_guide/hapi.md
+++ b/doc/fluid/beginners_guide/hapi.md
+# 简介
+PaddleHapi是飞桨新推出的高层API，PaddleHapi是对飞桨API的进一步封装与升级，提供了更加简洁易用的API，进一步提升了飞桨的易学易用性，并增强飞桨的功能。
+PaddleHapi面向从深度学习小白到资深开发者的所有人群，对于AI初学者来说，使用PaddleHapi可以简单快速的构建深度学习项目，对于资深开发者来说，可以使用PaddleHapi快速完成算法迭代。
+PaddleHapi具有以下特点：
+- 易学易用: 高层API是对普通动态图API的进一步封装和优化，同时保持与普通API的兼容性，高层API使用更加易学易用，同样的实现使用高层API可以节省大量的代码。
+- 低代码开发: 使用飞桨高层API的一个明显特点是，用户可编程代码量大大缩减。
+- 动静转换: 高层API支持动静转换，用户只需要改一行代码即可实现将动态图代码在静态图模式下训练，既方便用户使用动态图调试模型，又提升了模型训练效率。
+在功能增强与使用方式上，高层API有以下升级：
+1. 模型训练方式升级: 高层API中封装了Model类，继承了Model类的神经网络可以仅用几行代码完成模型的训练。
+2. 新增图像处理模块transform: 飞桨新增了图像预处理模块，其中包含十数种数据处理函数，基本涵盖了常用的数据处理、数据增强方法。
+3. 提供常用的神经网络模型可供调用: 高层API中集成了计算机视觉领域和自然语言处理领域常用模型，包括但不限于mobilenet、resnet、yolov3、cyclegan、bert、transformer、seq2seq等等。同时发布了对应模型的预训练模型，用户可以直接使用这些模型或者在此基础上完成二次开发。
+![](./image/hapi_gif.gif)
+## 目录
+* [特性]()
+* [快速使用]()
+* [新增功能]()
+* [使用示例]()
+## 特性
+### 易学易用
+高层API基于飞桨动态图实现，兼容飞桨动态图的所有功能，既秉承了动态图易学、易用、易调试的特点，又对飞桨的动态图做了进一步的封装与优化。
+### 低代码开发
+相比较与动态图的算法实现，使用高层API实现的算法可编程代码量更少，原始的动态图训练代码需要20多行代码才能完成模型的训练，使用高层API后，仅用8行代码即可实现相同的功能。
+使用普通API与高层API实现手写字符识别对比如下图，左边是普通动态图API的实现，右边是使用高层API的实现，可以明显发现，使用高层API的代码量更少。
+![](./image/new_hapi.png)
+### 动静统一
+高层API中实现了动静统一，用户无需感知到静态图、动态图的区别，只需要改一行代码即可实现将动态图代码在静态图模式下训练。动态图更方便调试模型，静态图的训练方式训练效率更高。
+高层API默认采用静态图的训练方式，我们可以使用 fluid.enable_dygraph() 切换到动态图模式下运行。
+```
+fluid.CUDAPlace()
+# 一行代码切换动态图训练模式
+fluid.enable_dygraph(place)
+# 声明网络结构
+model = Mnist("mnist")
+# 定义优化器
+optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.001, parameter_list=model.parameters())
+# 调用prepare() 完成训练的配置
+model.prepare(optimizer, CrossEntropy(), Accuracy(), inputs, labels, device='gpu')
+# 调用 fit()，启动模型的训练
+model.fit(train_dataset, val_dataset, batch_size=100, epochs=1, log_freq=100, save_dir="./output/")
+```
+## 快速使用
+以mnist手写字符识别为例，介绍飞桨高层API的使用方式。
+### 1. 搭建网络结构
+使用高层API组建网络与动态图的组网方式基本相同，唯一的区别在于，使用高层API组建网络需要继承Model这个类，而普通的动态图组网是需要继承dygraph.Layer类。
+高层API组网方式如下
+```
+from paddle.incubate.hapi.model import Model, Input
+from paddle.incubate.hapi.loss import CrossEntropy
+class Mnist(Model):
+    def __init__(self, name_scope):
+        super(Mnist, self).__init__()
+        self.fc = Linear(input_dim=784, output_dim=10, act="softmax")
+    # 定义网络结构的前向计算过程
+    def forward(self, inputs):
+        outputs = self.fc(inputs)
+        return outputs
+```
+### 2. 训练准备
+在开始训练前，需要定义优化器、损失函数、度量函数，准备数据等等。这些过程均可以在高层API Model类中的prepare函数中完成。
+```
+# 定义输入数据格式
+inputs = [Input([None, 784], 'float32', name='image')]
+labels = [Input([None, 1], 'int64', name='label')]
+# 声明网络结构
+model = Mnist("mnist")
+optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.001, parameter_list=model.parameters())
+# 使用高层API，prepare() 完成训练的配置
+model.prepare(optimizer, CrossEntropy(), Accuracy(), inputs, labels, device='gpu')
+```
+### 3. 启动训练
+使用高层API完成训练迭代过程时，使用一行代码即可构建双层循环程序，去控制训练的轮数和数据读取过程。
+```
+from paddle.incubate.hapi.datasets.mnist import MNIST as MnistDataset
+# 定义数据读取器
+train_dataset = MnistDataset(mode='train')
+val_dataset = MnistDataset(mode='test')
+# 启动训练
+model.fit(train_dataset, val_dataset, batch_size=100, epochs=10, log_freq=100, save_dir="./output/")
+```
+高层API中通过fit函数完成训练的循环过程，只需要设置训练的数据读取器、batchsize大小，迭代的轮数epoch、训练日志打印频率log_freq，保存模型的路径即可。
+## 新增功能
+除了使用高层API实现一行代码启动训练外，还新增了以下功能：
+- transform   数据增强模块
+- paddlevision  模型调用模块
+### transform
+vision.transform。图像预处理模块transform包括一系列的图像增强与图像处理实现，对处理计算机视觉相关的任务有很大帮助。
+下表中列出Transform支持的数据处理和数据增强API，如下所示：
+| transform的数据处理实现  | 函数功能 | |
+| :--------   | :-----   | :---- |
+|  Compose  | 组合多种数据变换 |
+|  Resize  | 将图像转换为固定大小 |  
+| RandomResizedCrop  |  根据输入比例对图像做随机剪切，然后resize到指定大小   |  
+|  CenterCrop  | 以图像的中心为中心对图像做剪切 |  |
+|  CenterCropResize  | 对图像做padding，padding后的图像做centercrop，然后resize到指定大小|  |
+|  RandomHorizontalFlip |  随机对图像做水平翻转   |    |
+|  RandomVerticalFlip |  随机对图像做垂直翻转   |    |
+|  Permute |  将数据的的维度换位   |    |
+|  Normalize |   用指定的均值和标准差对数据做归一化  |    |
+| GaussianNoise  |  给数据增加高斯噪声   |    |
+|  BrightnessTransform |  调整输入图像的亮度   |    |
+|  SaturationTransform |  调整输入图像的饱和度   |    |
+|  ContrastTransform |  调整输入图像的对比度   |    |
+| HueTransform  |   调整图像的色调  |    |
+|  ColorJitter |  随机调整图像的亮度、饱和度、对比度、和色调|    |
+使用方法如下：
+```
+from paddle.incubate.hapi.vision.transforms import transforms
+import cv2
+img_path = "./output/sample.jpg"
+img = cv2.imread(img_path)
+# 使用Compose 将可以将多个数据增强函数组合在一起
+trans_funcs = transforms.Compose([transforms.RandomResizedCrop(224),
+                                transforms.RandomHorizontalFlip(),
+                                transforms.BrightnessTransform(0.2)])
+label = None
+img_processed, label = trans_funcs(img, label)
+```
+上述代码的效果图如下：
+![](./image/hapi_transform.png)
+### paddlevision
+paddlevision中包含了高层API对常用模型的封装，包括ResNet、VGG、MobileNet、yoloV3、darknet、BMN
+transformer等等。使用这些现有的模型，可以快速的完成神经网络的训练、finetune等。
+使用paddlevision中的模型可以简单快速的构建一个深度学习任务，比如13代码即可实现resnet在Imagenet数据集上的训练：
+![](./image/paddlevision.png)
+## 更多使用示例
+更多的高层API使用示例请参考：
+- [bert](https://github.com/PaddlePaddle/hapi/tree/master/examples/bert)
+- [image classification](https://github.com/PaddlePaddle/hapi/tree/master/examples/image_classification)
+- [BMN](https://github.com/PaddlePaddle/hapi/tree/master/examples/bmn)
+- [cycleGAN](https://github.com/PaddlePaddle/hapi/tree/master/examples/cyclegan)
+- [ocr](https://github.com/PaddlePaddle/hapi/tree/master/examples/ocr)
+- [TSM](https://github.com/PaddlePaddle/hapi/tree/master/examples/tsm)
+- [yolov3](https://github.com/PaddlePaddle/hapi/tree/master/examples/yolov3)
+- [transformer](https://github.com/PaddlePaddle/hapi/tree/master/examples/transformer)
+- [seq2seq](https://github.com/PaddlePaddle/hapi/tree/master/examples/seq2seq)
+- [style-transfer](https://github.com/PaddlePaddle/hapi/tree/master/examples/style-transfer)
--- a/doc/fluid/beginners_guide/image/hapi_gif.gif
+++ b/doc/fluid/beginners_guide/image/hapi_gif.gif
--- a/doc/fluid/beginners_guide/image/hapi_transform.png
+++ b/doc/fluid/beginners_guide/image/hapi_transform.png
--- a/doc/fluid/beginners_guide/image/new_hapi.png
+++ b/doc/fluid/beginners_guide/image/new_hapi.png
--- a/doc/fluid/beginners_guide/image/paddlevision.png
+++ b/doc/fluid/beginners_guide/image/paddlevision.png
--- a/doc/fluid/beginners_guide/image/tensor.jpg
+++ b/doc/fluid/beginners_guide/image/tensor.jpg
--- a/doc/fluid/beginners_guide/index_cn.rst
+++ b/doc/fluid/beginners_guide/index_cn.rst
@@ -20,3 +20,4 @@ PaddlePaddle (PArallel Distributed Deep LEarning)是一个易用、高效、灵
    basic_concept/index_cn.rst
    coding_practice/index_cn.rst
+    hapi.md
--- a/doc/fluid/beginners_guide/index_en.rst
+++ b/doc/fluid/beginners_guide/index_en.rst
-################
-Beginner's Guide
-################
-PaddlePaddle (PArallel Distributed Deep LEarning) is a
-simple, efficient and extensible deep learning framework.
-Please refer to  `PaddlePaddle Github <https://github.com/PaddlePaddle/Paddle>`_ for details, and  `Release Note <../release_note_en.html>`_ for features incorporated in current version.
-Let's start with studying basic concept of PaddlePaddle:
- `Basic Concept <../beginners_guide/basic_concept/index_en.html>`_ ： introduce the basic concept and usage of Paddle
-If you have mastered the basic concept of Paddle and you expect to model and build your own network according to the actual problems, you can refer to some details of the use of paddle in the Coding Practice :
- `Coding Practice <../beginners_guide/coding_practice/index_en.html>`_ ： introduce how to model and build your own network for practical problems
-..  toctree::
-    :hidden:
-    basic_concept/index_en.rst
-    coding_practice/index_en.rst
--- a/doc/fluid/beginners_guide/install/Tables_en.md
+++ b/doc/fluid/beginners_guide/install/Tables_en.md
-***
-<a name="third_party"></a>
-# Appendix
-## Compile Dependency Table
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Dependency package name </th>
-        <th> Version </th>
-        <th> Description </th>
-        <th> Installation command </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> CMake </td>
-        <td> 3.4 </td>
-        <td>  </td>
-        <td>  </td>
-    </tr>
-    <tr>
-        <td> GCC </td>
-        <td> 4.8 / 5.4 </td>
-        <td>  recommends using devtools2 for CentOS </td>
-        <td>  </td>
-    </tr>
-        <tr>
-        <td> Python </td>
-        <td> 2.7.x. </td>
-        <td> depends on libpython2.7.so </td>
-        <td> <code> apt install python-dev </code> or <code> yum install python-devel </code></td>
-    </tr>
-    <tr>
-        <td> SWIG </td>
-        <td> at least 2.0 </td>
-        <td>  </td>
-        <td> <code>apt install swig </code> or <code> yum install swig </code> </td>
-    </tr>
-    <tr>
-        <td> wget </td>
-        <td> any </td>
-        <td>  </td>
-        <td> <code> apt install wget </code>  or <code> yum install wget </code> </td>
-    </tr>
-    <tr>
-        <td> openblas </td>
-        <td> any </td>
-        <td>  </td>
-        <td>  </td>
-    </tr>
-    <tr>
-        <td> pip </td>
-        <td> at least 9.0.1 </td>
-        <td>  </td>
-        <td> <code> apt install python-pip </code> or <code> yum install Python-pip </code> </td>
-    </tr>
-    <tr>
-        <td> numpy </td>
-        <td> >=1.12.0 </td>
-        <td>  </td>
-        <td> <code> pip install numpy==1.14.0 </code> </td>
-    </tr>
-    <tr>
-        <td> protobuf </td>
-        <td> 3.1.0 </td>
-        <td>  </td>
-        <td> <code> pip install protobuf==3.1.0 </code> </td>
-    </tr>
-    <tr>
-        <td> wheel </td>
-        <td> any </td>
-        <td>  </td>
-        <td> <code> pip install wheel </code> </td>
-    </tr>
-    <tr>
-        <td> patchELF </td>
-        <td> any </td>
-        <td>  </td>
-        <td> <code> apt install patchelf </code> or read github <a href="https://gist.github.com/ruario/80fefd174b3395d34c14">patchELF official documentation</a></td>
-    </tr>
-    <tr>
-        <td> go </td>
-        <td> >=1.8 </td>
-        <td> optional </td>
-        <td>  </td>
-    </tr>
-    </tbody>
-</table>
-</p>
-***
-<a name="Compile"></a>
-</br></br>
-## **Compile Option Table**
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Option </th>
-        <th> Description  </th>
-        <th> Default </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> WITH_GPU </td>
-        <td> Whether to support GPU </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_C_API </td>
-        <td> Whether to compile CAPI </td>
-        <td>  OFF </td>
-    </tr>
-        <tr>
-        <td> WITH_DOUBLE </td>
-        <td> Whether to use double precision floating point numeber </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_DSO </td>
-        <td> whether to load CUDA dynamic libraries dynamically at runtime, instead of statically loading CUDA dynamic libraries. </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_AVX </td>
-        <td> whether to compile PaddlePaddle binaries file containing the AVX instruction set </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_PYTHON </td>
-        <td> Whether the PYTHON interpreter is embedded </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_STYLE_CHECK </td>
-        <td> Whether to perform code style checking at compile time </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_TESTING </td>
-        <td>  Whether to turn on unit test </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_DOC </td>
-        <td> Whether to compile Chinese and English documents </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_SWIG_PY </td>
-        <td> Whether to compile PYTHON's SWIG interface, which can be used for predicting and customizing training </td>
-        <td> Auto </td>
-    <tr>
-        <td> WITH_GOLANG </td>
-        <td> Whether to compile the fault-tolerant parameter server of the go language </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_MKL </td>
-        <td> Whether to use the MKL math library, if not,using OpenBLAS </td>
-        <td> ON </td>
-    </tr>
-    <tr>
-        <td> WITH_SYSTEM_BLAS </td>
-        <td> Whether to use the system's BLAS </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_DISTRIBUTE </td>
-        <td> Whether to Compile with distributed version </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_RDMA </td>
-        <td> Whether to compile the relevant parts that supports RDMA </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> WITH_BRPC_RDMA </td>
-        <td> Whether to use BRPC RDMA as RPC protocol </td>
-        <td> OFF </td>
-    </tr>
-        <tr>
-        <td> ON_INFER </td>
-        <td> Whether to turn on prediction optimization </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> DWITH_ANAKIN </td>
-        <td> Whether to Compile ANAKIN </td>
-        <td> OFF </td>
-    </tr>
-    <tr>
-        <td> CUDA_ARCH_NAME </td>
-        <td> Build for which GPU architecture </td>
-        <td> All:all available GPU architectures Auto:Automatically detect current GPU architecture </td>
-    </tr>
-    <tr>
-        <td> TENSORRT_ROOT </td>
-        <td> Assign TensoRRT path </td>
-        <td> If this flag is not assigned, Paddle will detect TensorRT automatically. </td>
-    </tr>
-   </tbody>
-</table>
-</p>
-**BLAS**
-PaddlePaddle supports two BLAS libraries, [MKL](https://software.intel.com/en-us/mkl) and [OpenBlAS](http://www.openblas.net/). MKL is used by default. If you use MKL and the machine contains the AVX2 instruction set, you will also download the MKL-DNN math library, for details please refer to [here](https://github.com/PaddlePaddle/Paddle/tree/release/0.11.0/doc/design/mkldnn#cmake).
-If you close MKL, OpenBLAS will be used as the BLAS library.
-**CUDA/cuDNN**
-PaddlePaddle automatically finds the CUDA and cuDNN libraries installed in the system for compilation and execution at compile time/runtime. Use the parameter `-DCUDA_ARCH_NAME=Auto` to specify to enable automatic detection of the SM architecture and speed up compilation.
-PaddlePaddle can be compiled and run using any version after cuDNN v5.1, but try to keep the same version of cuDNN in the compiling and running processes. We recommend using the latest version of cuDNN.
-**Configure Compile Options**
-PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specifying paths at compile time. When cmake compiles, it first searches the system paths ( `/usr/liby` and `/usr/local/lib` ) for these libraries, and also reads the relevant path variables for searching. Can be set by using the `-D` command, for example:
-> `Cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCUDNN_ROOT=/opt/cudnnv5`
-**Note**: The settings introduced here for these compilation options are only valid for the first cmake. If you want to reset it later, it is recommended to clean up the entire build directory ( rm -rf ) and then specify it.
-***
-<a name="whls"></a>
-</br></br>
-## **Installation Package List**
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Version Number </th>
-        <th> Release Discription </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> paddlepaddle==[version code] such as paddlepaddle==1.5.1 </td>
-        <td> Only support the corresponding version of the CPU PaddlePaddle, please refer to <a href=https://pypi.org/project/paddlepaddle/#history>Pypi</a> for the specific version. </td>
-    </tr>
-    <tr>
-        <td> paddlepaddle-gpu==1.5.1 </td>
-        <td>  Using version 1.5.1 compiled with CUDA 9.0 and cuDNN 7 </td>
-    </tr>
-    <tr>
-        <td> paddlepaddle-gpu==1.5.1.post87 </td>
-        <td> Using version 1.5.1 compiled with CUDA 8.0 and cuDNN 7 </td>
-    </tr>
-   </tbody>
-</table>
-</p>
-You can find various distributions of PaddlePaddle-gpu in [the Release History](https://pypi.org/project/paddlepaddle-gpu/#history).
-Please note that: paddlepaddle-gpu in windows, will download package compiled with CUDA 8.0 and cuDNN 7
-***
-<a name="dockers"></a>
-</br></br>
-## Installation Mirrors and Introduction
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Version Number </th>
-        <th> Release Description </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> hub.baidubce.com/paddlepaddle/paddle:latest </td>
-        <td> The latest pre-installed image of the PaddlePaddle CPU version </td>
-    </tr>
-    <tr>
-        <td> hub.baidubce.com/paddlepaddle/paddle:latest-dev </td>
-        <td> The latest PaddlePaddle development environment </td>
-    </tr>
-        <tr>
-        <td> hub.baidubce.com/paddlepaddle/paddle:[Version] </td>
-        <td>  Replace version with a specific version, preinstalled PaddlePaddle image in historical version </td>
-    </tr>
-    <tr>
-        <td> hub.baidubce.com/paddlepaddle/paddle:latest-gpu </td>
-        <td> The latest pre-installed image of the PaddlePaddle GPU version </td>
-    </tr>
-   </tbody>
-</table>
-</p>
-You can find the docker image for each release of PaddlePaddle in the [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/).
-***
-<a name="ciwhls-release"></a>
-</br></br>
-## **Multi-version whl package list - Release**
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Release Instruction </th>
-        <th> cp27-cp27mu </th>
-        <th> cp27-cp27m </th>
-        <th> cp35-cp35m    </th>
-        <th> cp36-cp36m    </th>
-        <th> cp37-cp37m    </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> cpu-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mkl/paddlepaddle-1.5.1-cp27-cp27mu-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mkl/paddlepaddle-1.5.1-cp27-cp27m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mkl/paddlepaddle-1.5.1-cp35-cp35m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mkl/paddlepaddle-1.5.1-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mkl/paddlepaddle-1.5.1-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cpu-openblas </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-openblas/paddlepaddle-1.5.1-cp27-cp27mu-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-openblas/paddlepaddle-1.5.1-cp27-cp27m-linux_x86_64.whl"> paddlepaddle-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-openblas/paddlepaddle-1.5.1-cp35-cp35m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-openblas/paddlepaddle-1.5.1-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-openblas/paddlepaddle-1.5.1-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda8-cudnn7-openblas </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-1.5.1-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-1.5.1-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-1.5.1-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-1.5.1-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-1.5.1-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda8-cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post87-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post87-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post87-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post87-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post87-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda9-cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post97-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post97-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post97-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post97-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post97-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda10_cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post107-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post107-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post107-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-1.5.1-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post107-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle_gpu-1.5.1-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-1.5.1.post107-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle_gpu-1.5.1-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> win_cpu_openblas </td>
-        <td> - </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp27-cp27m-win_amd64.whl">
-        paddlepaddle-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp35-cp35m-win_amd64.whl">
-        paddlepaddle-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp36-cp36m-win_amd64.whl">
-        paddlepaddle-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp37-cp37m-win_amd64.whl">
-        paddlepaddle-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
-    </tr>
-    <tr>
-        <td> win_cuda8_cudnn7_openblas </td>
-        <td> - </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp27-cp27m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp35-cp35m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp36-cp36m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp37-cp37m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
-    </tr>
-    <tr>
-        <td> win_cuda9_cudnn7_openblas </td>
-        <td> - </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp27-cp27m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp35-cp35m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp36-cp36m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp37-cp37m-win_amd64.whl">
-        paddlepaddle_gpu-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
-    </tr>  
-    <tr>
-        <td> mac_cpu </td>
-        <td> - </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mac/paddlepaddle-1.5.1-cp27-cp27m-macosx_10_6_intel.whl">
-        paddlepaddle-1.5.1-cp27-cp27m-macosx_10_6_intel.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mac/paddlepaddle-1.5.1-cp35-cp35m-macosx_10_6_intel.whl">
-        paddlepaddle-1.5.1-cp35-cp35m-macosx_10_6_intel.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mac/paddlepaddle-1.5.1-cp36-cp36m-macosx_10_6_intel.whl">
-        paddlepaddle-1.5.1-cp36-cp36m-macosx_10_6_intel.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-cpu-mac/paddlepaddle-1.5.1-cp37-cp37m-macosx_10_6_intel.whl">
-        paddlepaddle-1.5.1-cp37-cp37m-macosx_10_6_intel.whl</a></td>
-    </tr>
-   </tbody>
-</table>
-</p>
-<a name="ciwhls"></a>
-</br></br>
-## **Multi-version whl package list - dev**
-<p align="center">
-<table>
-    <thead>
-    <tr>
-        <th> Release Instruction </th>
-        <th> cp27-cp27mu </th>
-        <th> cp27-cp27m </th>
-        <th> cp35-cp35m    </th>
-        <th> cp36-cp36m    </th>
-        <th> cp37-cp37m    </th>
-    </tr>
-    </thead>
-    <tbody>
-    <tr>
-        <td> cpu-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-mkl/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl">
-        paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-mkl/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl">
-        paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-mkl/paddlepaddle-latest-cp35-cp35m-linux_x86_64.whl">
-        paddlepaddle-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-mkl/paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-mkl/paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cpu-openblas </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-openblas/paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl">
-        paddlepaddle-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-openblas/paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl"> paddlepaddle-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-openblas/paddlepaddle-latest-cp35-cp35m-linux_x86_64.whl">
-        paddlepaddle-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-openblas/paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-cpu-openblas/paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda8-cudnn7-openblas </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-openblas/paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda8-cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda8-cudnn7-mkl/paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda9-cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-    <tr>
-        <td> cuda10_cudnn7-mkl </td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27mu-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp27-cp27m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl"> paddlepaddle_gpu-latest-cp35-cp35m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl">
-        paddlepaddle_gpu-latest-cp36-cp36m-linux_x86_64.whl</a></td>
-        <td> <a href="https://paddle-wheel.bj.bcebos.com/latest-gpu-cuda10-cudnn7-mkl/paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl">
-        paddlepaddle_gpu-latest-cp37-cp37m-linux_x86_64.whl</a></td>
-    </tr>
-   </tbody>
-</table>
-</p>
-</br></br>
-## Execute the PaddlePaddle training program in Docker
-***
-Suppose you have written a PaddlePaddle program in the current directory (such as /home/work): `train.py` ( refer to [PaddlePaddleBook](https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/README.cn.md) to write), you can start the training with the following command:
-    cd /home/work
-    docker run -it -v $PWD:/work hub.baidubce.com/paddlepaddle/paddle /work/train.py
-In the above commands, the `-it` parameter indicates that the container has been run interactively; `-v $PWD:/work` specifies that the current path (the absolute path where the PWD variable in Linux will expand to the current path) is mounted to the `:/work` directory inside the container: `Hub.baidubce.com/paddlepaddle/paddle` specifies the container to be used; finally `/work/train.py` is the command executed inside the container, ie. the training program.
-Of course, you can also enter into the Docker container and execute or debug your code interactively:
-    docker run -it -v $PWD:/work hub.baidubce.com/paddlepaddle/paddle /bin/bash
-    cd /work
-    python train.py
-**Note: In order to reduce the size, vim is not installed in PaddlePaddle Docker image by default. You can edit the code in the container after executing ** `apt-get install -y vim` **(which installs vim for you) in the container.**
-</br></br>
-## Start PaddlePaddle Book tutorial with Docker
-***
-Use Docker to quickly launch a local Jupyter Notebook containing the PaddlePaddle official Book tutorial, which can be viewed on the web. PaddlePaddle Book is an interactive Jupyter Notebook for users and developers. If you want to learn more about deep learning, PaddlePaddle Book is definitely your best choice. You can read tutorials or create and share interactive documents with code, formulas, charts, and text.
-We provide a Docker image that can run the PaddlePaddle Book directly, running directly:
-`docker run -p 8888:8888 hub.baidubce.com/paddlepaddle/book`
-Domestic users can use the following image source to speed up access:
-`docker run -p 8888:8888 hub.baidubce.com/paddlepaddle/book`
-Then enter the following URL in your browser:
-`http://localhost:8888/`
-It's that simple and bon voyage! For further questions, please refer to the [FAQ](#FAQ).
-</br></br>
-## Perform GPU training using Docker
-***
-In order to ensure that the GPU driver works properly in the image, we recommend using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to run the image. Don't forget to install the latest GPU drivers on your physical machine in advance.
-`Nvidia-docker run -it -v $PWD:/work hub.baidubce.com/paddlepaddle/paddle:latest-gpu /bin/bash`
-**Note: If you don't have nvidia-docker installed, you can try the following to mount the CUDA library and Linux devices into the Docker container:**
-    export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') \
-    $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
-    export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
-    docker run ${CUDA_SO} \
-      ${DEVICES} -it hub.baidubce.com/paddlepaddle/paddle:latest-gpu
-**About AVX:**
-AVX is a set of CPU instructions that speeds up the calculation of PaddlePaddle. The latest PaddlePaddle Docker image is enabled by default for AVX compilation, so if your computer does not support AVX, you need to compile PaddlePaddle to no-avx version separately.
-The following instructions can check if the Linux computer supports AVX:
-`if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi`
-If the output is No, you need to choose a mirror that uses no-AVX.
--- a/doc/fluid/beginners_guide/quick_start.rst
+++ b/doc/fluid/beginners_guide/quick_start.rst
-Quick Start
-=============
-Quick Installation
--------------------
-PaddlePaddle supports quick installation by pip. Execute the following commands to finish quick installation of the CPU version:
-.. code-block:: bash
-	pip install paddlepaddle
-If you need to install the GPU version, or look up more specific installation methods, please refer to `Installation Instructions <../beginners_guide/install/index_en.html>`_
-Quick Usage
-------------
-First, you need to import the fluid library
-.. code-block:: python
-	import paddle.fluid as fluid
-* Tensor Operations
-The following simple examples may help you quickly know about Fluid:
-1.use Fluid to create a one-dimensional array with five elements, and each element is 1
-.. code-block:: python
-	# define the dimension of an array and the data type, and the parameter 'shape' can be modified to define an array of any size
-	data = fluid.layers.ones(shape=[5], dtype='int64')
-	# compute on the CPU
-	place = fluid.CPUPlace()
-	# create executors
-	exe = fluid.Executor(place)
-	# execute computation
-	ones_result = exe.run(fluid.default_main_program(),
-	                        # get data
-				fetch_list=[data], 
-				return_numpy=True)
-	# output the results
-	print(ones_result[0])
-you can get the results:
-.. code-block:: text
-	[1 1 1 1 1]
-2.use Fluid to add two arrays by bits
-.. code-block:: python
-	# call elementwise_op to add the generative arrays by bits
-	add = fluid.layers.elementwise_add(data,data)
-	# define computation place
-	place = fluid.CPUPlace()
-	exe = fluid.Executor(place)
-	# execute computation
-	add_result = exe.run(fluid.default_main_program(),
-	                 fetch_list=[add],
-	                 return_numpy=True)
-	# output the results
-	print (add_result[0])
-you can get the results:
-.. code-block:: text
-	[2 2 2 2 2]
-3.use Fluid to transform the data type
-.. code-block:: python
-	# transform a one-dimentional array of int to float64
-	cast = fluid.layers.cast(x=data, dtype='float64')
-	# define computation place to execute computation
-	place = fluid.CPUPlace()
-	exe = fluid.Executor(place)
-	cast_result = exe.run(fluid.default_main_program(),
-	                 fetch_list=[cast],
-	                 return_numpy=True)
-	# output the results
-	print(cast_result[0])
-you can get the results:
-.. code-block:: text
-	[1. 1. 1. 1. 1.]
-Operate the Linear Regression Model
-------------------------------------
-By the simple example above, you may have known how to operate data with Fluid to some extent, so please try to create a test.py, and copy the following codes.
-This a a simple linear regression model to help us quickly solve the quaternary linear equation.
-.. code-block:: python
-	#load the library
-	import paddle.fluid as fluid
-	import numpy as np
-	#generate data
-	np.random.seed(0)
-	outputs = np.random.randint(5, size=(10, 4))
-	res = []
-	for i in range(10):
-		# assume the equation is y=4a+6b+7c+2d
-		y = 4*outputs[i][0]+6*outputs[i][1]+7*outputs[i][2]+2*outputs[i][3]
-		res.append([y])
-	# define data
-	train_data=np.array(outputs).astype('float32')
-	y_true = np.array(res).astype('float32')
-	#define the network
-	x = fluid.layers.data(name="x",shape=[4],dtype='float32')
-	y = fluid.layers.data(name="y",shape=[1],dtype='float32')
-	y_predict = fluid.layers.fc(input=x,size=1,act=None)
-	#define loss function
-	cost = fluid.layers.square_error_cost(input=y_predict,label=y)
-	avg_cost = fluid.layers.mean(cost)
-	#define optimization methods
-	sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.05)
-	sgd_optimizer.minimize(avg_cost)
-	#initialize parameters
-	cpu = fluid.CPUPlace()
-	exe = fluid.Executor(cpu)
-	exe.run(fluid.default_startup_program())
-	##start training and iterate for 500 times
-	for i in range(500):
-		outs = exe.run(
-			feed={'x':train_data,'y':y_true},
-			fetch_list=[y_predict.name,avg_cost.name])
-		if i%50==0:
-			print ('iter={:.0f},cost={}'.format(i,outs[1][0]))
-	#save the training result
-	params_dirname = "result"
-	fluid.io.save_inference_model(params_dirname, ['x'], [y_predict], exe)
-	# start inference
-	infer_exe = fluid.Executor(cpu)
-	inference_scope = fluid.Scope()
-	# load the trained model
-	with fluid.scope_guard(inference_scope):
-		[inference_program, feed_target_names,
-		 fetch_targets] = fluid.io.load_inference_model(params_dirname, infer_exe)
-	# generate test data
-	test = np.array([[[9],[5],[2],[10]]]).astype('float32')
-	# inference
-	results = infer_exe.run(inference_program,
-							feed={"x": test},
-							fetch_list=fetch_targets) 
-	# give the problem 【9,5,2,10】 and output the value of y=4*9+6*5+7*2+10*2
-	print ("9a+5b+2c+10d={}".format(results[0][0]))
-.. code-block:: text
-    get the result:
-	9a+5b+2c+10d=[99.946]
-The output result should be a value close to 100, which may have a few errors every time.
--- a/doc/fluid/beginners_guide/quick_start_cn.rst
+++ b/doc/fluid/beginners_guide/quick_start_cn.rst
-快速开始
-===========
-快速安装
----------
-PaddlePaddle支持使用pip快速安装， 执行下面的命令完成CPU版本的快速安装：
-.. code-block:: bash
-	pip install -U paddlepaddle
-如需安装GPU版本的PaddlePaddle，执行下面的命令完成GPU版本的快速安装:
-.. code-block:: bash
-	pip install -U paddlepaddle-gpu
-同时请保证您参考NVIDIA官网，已经正确配置和安装了显卡驱动，`CUDA 9 <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/>`_ ，`cuDNN 7.3 <https://docs.nvidia.com/deeplearning/sdk/cudnn-install/>`_ ，`NCCL2 <https://developer.nvidia.com/nccl/nccl-download/>`_ 等依赖，其他更加详细的安装信息请参考：`安装说明 <https://www.paddlepaddle.org.cn/install/doc/index>`_
-快速使用
-------------
-首先，您需要导入fluid库
-.. code-block:: python
-	import paddle.fluid as fluid
-* Tensor操作
-下面几个简单的案例，可以帮助您快速了解Fluid：
-1.使用Fluid创建5个元素的一维数组，其中每个元素都为1
-.. code-block:: python
-	# 定义数组维度及数据类型，可以修改shape参数定义任意大小的数组
-	data = fluid.layers.ones(shape=[5], dtype='int64')
-	# 在CPU上执行运算
-	place = fluid.CPUPlace()
-	# 创建执行器
-	exe = fluid.Executor(place)
-	# 执行计算
-	ones_result = exe.run(fluid.default_main_program(),
-	                        # 获取数据data
-				fetch_list=[data], 
-				return_numpy=True)
-	# 输出结果
-	print(ones_result[0])
-可以得到结果：
-.. code-block:: text
-	[1 1 1 1 1]
-2.使用Fluid将两个数组按位相加
-.. code-block:: python
-	# 调用 elementwise_op 将生成的一维数组按位相加
-	add = fluid.layers.elementwise_add(data,data)
-	# 定义运算场所
-	place = fluid.CPUPlace()
-	exe = fluid.Executor(place)
-	# 执行计算
-	add_result = exe.run(fluid.default_main_program(),
-	                 fetch_list=[add],
-	                 return_numpy=True)
-	# 输出结果
-	print (add_result[0])
-可以得到结果：
-.. code-block:: text
-	[2 2 2 2 2]
-3.使用Fluid转换数据类型
-.. code-block:: python
-	# 将一维整型数组，转换成float64类型
-	cast = fluid.layers.cast(x=data, dtype='float64')
-	# 定义运算场所执行计算
-	place = fluid.CPUPlace()
-	exe = fluid.Executor(place)
-	cast_result = exe.run(fluid.default_main_program(),
-	                 fetch_list=[cast],
-	                 return_numpy=True)
-	# 输出结果
-	print(cast_result[0])
-可以得到结果：
-.. code-block:: text
-	[1. 1. 1. 1. 1.]
-运行线性回归模型
-----------------
-通过上面的小例子，相信您已经对如何使用Fluid操作数据有了一定的了解，那么试着创建一个test.py，并粘贴下面的代码吧。
-这是一个简单的线性回归模型，来帮助我们快速求解4元一次方程。
-.. code-block:: python
-	#加载库
-	import paddle.fluid as fluid
-	import numpy as np
-	#生成数据
-	np.random.seed(0)
-	outputs = np.random.randint(5, size=(10, 4))
-	res = []
-	for i in range(10):
-		# 假设方程式为 y=4a+6b+7c+2d
-		y = 4*outputs[i][0]+6*outputs[i][1]+7*outputs[i][2]+2*outputs[i][3]
-		res.append([y])
-	# 定义数据
-	train_data=np.array(outputs).astype('float32')
-	y_true = np.array(res).astype('float32')
-	#定义网络
-	x = fluid.layers.data(name="x",shape=[4],dtype='float32')
-	y = fluid.layers.data(name="y",shape=[1],dtype='float32')
-	y_predict = fluid.layers.fc(input=x,size=1,act=None)
-	#定义损失函数
-	cost = fluid.layers.square_error_cost(input=y_predict,label=y)
-	avg_cost = fluid.layers.mean(cost)
-	#定义优化方法
-	sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.05)
-	sgd_optimizer.minimize(avg_cost)
-	#参数初始化
-	cpu = fluid.CPUPlace()
-	exe = fluid.Executor(cpu)
-	exe.run(fluid.default_startup_program())
-	##开始训练，迭代500次
-	for i in range(500):
-		outs = exe.run(
-			feed={'x':train_data,'y':y_true},
-			fetch_list=[y_predict.name,avg_cost.name])
-		if i%50==0:
-			print ('iter={:.0f},cost={}'.format(i,outs[1][0]))
-	#存储训练结果
-	params_dirname = "result"
-	fluid.io.save_inference_model(params_dirname, ['x'], [y_predict], exe)
-	# 开始预测
-	infer_exe = fluid.Executor(cpu)
-	inference_scope = fluid.Scope()
-	# 加载训练好的模型
-	with fluid.scope_guard(inference_scope):
-		[inference_program, feed_target_names,
-		 fetch_targets] = fluid.io.load_inference_model(params_dirname, infer_exe)
-	# 生成测试数据
-	test = np.array([[[9],[5],[2],[10]]]).astype('float32')
-	# 进行预测
-	results = infer_exe.run(inference_program,
-							feed={"x": test},
-							fetch_list=fetch_targets) 
-	# 给出题目为 【9,5,2,10】 输出y=4*9+6*5+7*2+10*2的值
-	print ("9a+5b+2c+10d={}".format(results[0][0]))
-.. code-block:: text
-    得到结果：
-	9a+5b+2c+10d=[99.946]
-输出结果应是一个近似等于100的值，每次计算结果略有不同。