Merge remote-tracking branch 'upstream/develop' into develop

f863e088 · gongweibao · 23a236bc · 2f6da124 · f863e088 · f863e088
41 changed file
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -32,3 +32,11 @@
        entry: python pre-commit-hooks/convert_markdown_into_html.py
        language: system
        files: \.md$
+-   repo: local
+    hooks:
+    -  id: convert-markdown-into-ipynb
+       name: convert-markdown-into-ipynb
+       description: Convert README.md into README.ipynb and README.en.md into README.en.ipynb
+       entry: ./pre-commit-hooks/convert_markdown_into_ipynb.sh
+       language: system
+       files: \.md$
--- a/.travis.yml
+++ b/.travis.yml
@@ -14,8 +14,10 @@ addons:
      - python
      - python-pip
      - python2.7-dev
+      - golang
 before_install:
  -  pip install virtualenv pre-commit
+  -  GOPATH=/tmp/go go get -u github.com/wangkuiyi/ipynb/markdown-to-ipynb
 script:
  - travis/precommit.sh
 notifications:

--- a/fit_a_line/README.en.ipynb
+++ b/fit_a_line/README.en.ipynb
--- a/fit_a_line/README.en.md
+++ b/fit_a_line/README.en.md
 # Linear Regression
 Let us begin the tutorial with a classical problem called Linear Regression \[[1](#References)\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.

-The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Problem Setup
 Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.

--- a/fit_a_line/README.ipynb
+++ b/fit_a_line/README.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# 线性回归\n",
+        "让我们从经典的线性回归（Linear Regression \\[[1](#参考文献)\\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。\n",
+        "\n",
+        "本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。\n",
+        "\n",
+        "## 背景介绍\n",
+        "给定一个大小为$n$的数据集  ${\\{y_{i}, x_{i1}, ..., x_{id}\\}}_{i=1}^{n}$，其中$x_{i1}, \\ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即\n",
+        "\n",
+        "$$y_i = \\omega_1x_{i1} + \\omega_2x_{i2} + \\ldots + \\omega_dx_{id} + b,  i=1,\\ldots,n$$\n",
+        "\n",
+        "例如，在我们将要建模的房价预测问题里，$x_{ij}$是描述房子$i$的各种属性（比如房间的个数、周围学校和医院的个数、交通状况等），而 $y_i$是房屋的价格。\n",
+        "\n",
+        "初看起来，这个假设实在过于简单了，变量间的真实关系很难是线性的。但由于线性回归模型有形式简单和易于建模分析的优点，它在实际问题中得到了大量的应用。很多经典的统计学习、机器学习书籍\\[[2,3,4](#参考文献)\\]也选择对线性模型独立成章重点讲解。\n",
+        "\n",
+        "## 效果展示\n",
+        "我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。\n",
+        "\u003cp align=\"center\"\u003e\n",
+        "    \u003cimg src = \"image/predictions.png\" width=400\u003e\u003cbr/\u003e\n",
+        "    图1. 预测值 V.S. 真实值\n",
+        "\u003c/p\u003e\n",
+        "\n",
+        "## 模型概览\n",
+        "\n",
+        "### 模型定义\n",
+        "\n",
+        "在波士顿房价数据集中，和房屋相关的值共有14个：前13个用来描述房屋相关的各种信息，即模型中的 $x_i$；最后一个值为我们要预测的该类房屋价格的中位数，即模型中的 $y_i$。因此，我们的模型就可以表示成：\n",
+        "\n",
+        "$$\\hat{Y} = \\omega_1X_{1} + \\omega_2X_{2} + \\ldots + \\omega_{13}X_{13} + b$$\n",
+        "\n",
+        "$\\hat{Y}$ 表示模型的预测结果，用来和真实值$Y$区分。模型要学习的参数即：$\\omega_1, \\ldots, \\omega_{13}, b$。\n",
+        "\n",
+        "建立模型后，我们需要给模型一个优化目标，使得学到的参数能够让预测值$\\hat{Y}$尽可能地接近真实值$Y$。这里我们引入损失函数（[Loss Function](https://en.wikipedia.org/wiki/Loss_function)，或Cost Function）这个概念。 输入任意一个数据样本的目标值$y_{i}$和模型给出的预测值$\\hat{y_{i}}$，损失函数输出一个非负的实值。这个实质通常用来反映模型误差的大小。\n",
+        "\n",
+        "对于线性回归模型来讲，最常见的损失函数就是均方误差（Mean Squared Error， [MSE](https://en.wikipedia.org/wiki/Mean_squared_error)）了，它的形式是：\n",
+        "\n",
+        "$$MSE=\\frac{1}{n}\\sum_{i=1}^{n}{(\\hat{Y_i}-Y_i)}^2$$\n",
+        "\n",
+        "即对于一个大小为$n$的测试集，$MSE$是$n$个数据预测结果误差平方的均值。\n",
+        "\n",
+        "### 训练过程\n",
+        "\n",
+        "定义好模型结构之后，我们要通过以下几个步骤进行模型训练\n",
+        " 1. 初始化参数，其中包括权重$\\omega_i$和偏置$b$，对其进行初始化（如0均值，1方差）。\n",
+        " 2. 网络正向传播计算网络输出和损失函数。\n",
+        " 3. 根据损失函数进行反向误差传播 （[backpropagation](https://en.wikipedia.org/wiki/Backpropagation)），将网络误差从输出层依次向前传递, 并更新网络中的参数。\n",
+        " 4. 重复2~3步骤，直至网络训练误差达到规定的程度或训练轮次达到设定值。\n",
+        "\n",
+        "## 数据集\n",
+        "\n",
+        "### 数据集接口的封装\n",
+        "首先加载需要的包\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "import paddle.v2 as paddle\n",
+        "import paddle.v2.dataset.uci_housing as uci_housing\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "我们通过uci_housing模块引入了数据集合[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)\n",
+        "\n",
+        "其中，在uci_housing模块中封装了：\n",
+        "\n",
+        "1. 数据下载的过程。下载数据保存在~/.cache/paddle/dataset/uci_housing/housing.data。\n",
+        "2. [数据预处理](#数据预处理)的过程。\n",
+        "\n",
+        "\n",
+        "### 数据集介绍\n",
+        "这份数据集共506行，每行包含了波士顿郊区的一类房屋的相关信息及该类房屋价格的中位数。其各维属性的意义如下：\n",
+        "\n",
+        "| 属性名 | 解释 | 类型 |\n",
+        "| ------| ------ | ------ |\n",
+        "| CRIM | 该镇的人均犯罪率 | 连续值 |\n",
+        "| ZN | 占地面积超过25,000平方呎的住宅用地比例 | 连续值 |\n",
+        "| INDUS | 非零售商业用地比例 | 连续值 |\n",
+        "| CHAS | 是否邻近 Charles River  | 离散值，1=邻近；0=不邻近 |\n",
+        "| NOX | 一氧化氮浓度 | 连续值 |\n",
+        "| RM | 每栋房屋的平均客房数 | 连续值 |\n",
+        "| AGE | 1940年之前建成的自用单位比例 | 连续值 |\n",
+        "| DIS | 到波士顿5个就业中心的加权距离 | 连续值 |\n",
+        "| RAD | 到径向公路的可达性指数 | 连续值 |\n",
+        "| TAX | 全值财产税率 | 连续值 |\n",
+        "| PTRATIO | 学生与教师的比例 | 连续值 |\n",
+        "| B | 1000(BK - 0.63)^2，其中BK为黑人占比 | 连续值 |\n",
+        "| LSTAT | 低收入人群占比 | 连续值 |\n",
+        "| MEDV | 同类房屋价格的中位数 | 连续值 |\n",
+        "\n",
+        "### 数据预处理\n",
+        "#### 连续值与离散值\n",
+        "观察一下数据，我们的第一个发现是：所有的13维属性中，有12维的连续值和1维的离散值（CHAS）。离散值虽然也常使用类似0、1、2这样的数字表示，但是其含义与连续值是不同的，因为这里的差值没有实际意义。例如，我们用0、1、2来分别表示红色、绿色和蓝色的话，我们并不能因此说“蓝色和红色”比“绿色和红色”的距离更远。所以通常对一个有$d$个可能取值的离散属性，我们会将它们转为$d$个取值为0或1的二值属性或者将每个可能取值映射为一个多维向量。不过就这里而言，因为CHAS本身就是一个二值属性，就省去了这个麻烦。\n",
+        "\n",
+        "#### 属性的归一化\n",
+        "另外一个稍加观察即可发现的事实是，各维属性的取值范围差别很大（如图2所示）。例如，属性B的取值范围是[0.32, 396.90]，而属性NOX的取值范围是[0.3850, 0.8170]。这里就要用到一个常见的操作-归一化（normalization）了。归一化的目标是把各位属性的取值范围放缩到差不多的区间，例如[-0.5,0.5]。这里我们使用一种很常见的操作方法：减掉均值，然后除以原取值范围。\n",
+        "\n",
+        "做归一化（或 [Feature scaling](https://en.wikipedia.org/wiki/Feature_scaling)）至少有以下3个理由：\n",
+        "- 过大或过小的数值范围会导致计算时的浮点上溢或下溢。\n",
+        "- 不同的数值范围会导致不同属性对模型的重要性不同（至少在训练的初始阶段如此），而这个隐含的假设常常是不合理的。这会对优化的过程造成困难，使训练时间大大的加长。\n",
+        "- 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。\n",
+        "\n",
+        "\u003cp align=\"center\"\u003e\n",
+        "    \u003cimg src = \"image/ranges.png\" width=550\u003e\u003cbr/\u003e\n",
+        "    图2. 各维属性的取值范围\n",
+        "\u003c/p\u003e\n",
+        "\n",
+        "#### 整理训练集与测试集\n",
+        "我们将数据集分割为两份：一份用于调整模型的参数，即进行模型的训练，模型在这份数据集上的误差被称为**训练误差**；另外一份被用来测试，模型在这份数据集上的误差被称为**测试误差**。我们训练模型的目的是为了通过从训练数据中找到规律来预测未知的新数据，所以测试误差是更能反映模型表现的指标。分割数据的比例要考虑到两个因素：更多的训练数据会降低参数估计的方差，从而得到更可信的模型；而更多的测试数据会降低测试误差的方差，从而得到更可信的测试误差。我们这个例子中设置的分割比例为$8:2$\n",
+        "\n",
+        "\n",
+        "在更复杂的模型训练过程中，我们往往还会多使用一种数据集：验证集。因为复杂的模型中常常还有一些超参数（[Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_optimization)）需要调节，所以我们会尝试多种超参数的组合来分别训练多个模型，然后对比它们在验证集上的表现选择相对最好的一组超参数，最后才使用这组参数下训练的模型在测试集上评估测试误差。由于本章训练的模型比较简单，我们暂且忽略掉这个过程。\n",
+        "\n",
+        "## 训练\n",
+        "\n",
+        "`fit_a_line/trainer.py`演示了训练的整体过程。\n",
+        "\n",
+        "### 初始化PaddlePaddle\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "paddle.init(use_gpu=False, trainer_count=1)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "### 模型配置\n",
+        "\n",
+        "线性回归的模型其实就是一个采用线性激活函数（linear activation，`LinearActivation`）的全连接层（fully-connected layer，`fc_layer`）：\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "x = paddle.layer.data(name='x', type=paddle.data_type.dense_vector(13))\n",
+        "y_predict = paddle.layer.fc(input=x,\n",
+        "                                size=1,\n",
+        "                                act=paddle.activation.Linear())\n",
+        "y = paddle.layer.data(name='y', type=paddle.data_type.dense_vector(1))\n",
+        "cost = paddle.layer.regression_cost(input=y_predict, label=y)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### 创建参数\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "parameters = paddle.parameters.create(cost)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "### 创建Trainer\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "optimizer = paddle.optimizer.Momentum(momentum=0)\n",
+        "\n",
+        "trainer = paddle.trainer.SGD(cost=cost,\n",
+        "                             parameters=parameters,\n",
+        "                             update_equation=optimizer)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "### 读取数据且打印训练的中间信息\n",
+        "\n",
+        "PaddlePaddle提供一个\n",
+        "[reader机制](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/design/reader)\n",
+        "来读取数据。 Reader返回的数据可以包括多列，我们需要一个Python dict把列\n",
+        "序号映射到网络里的数据层。\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "feeding={'x': 0, 'y': 1}\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "此外，我们还可以提供一个 event handler，来打印训练的进度：\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "# event_handler to print training and testing info\n",
+        "def event_handler(event):\n",
+        "    if isinstance(event, paddle.event.EndIteration):\n",
+        "        if event.batch_id % 100 == 0:\n",
+        "            print \"Pass %d, Batch %d, Cost %f\" % (\n",
+        "                event.pass_id, event.batch_id, event.cost)\n",
+        "\n",
+        "    if isinstance(event, paddle.event.EndPass):\n",
+        "        result = trainer.test(\n",
+        "            reader=paddle.batch(\n",
+        "                uci_housing.test(), batch_size=2),\n",
+        "            feeding=feeding)\n",
+        "        print \"Test %d, Cost %f\" % (event.pass_id, result.cost)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "### 开始训练\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "trainer.train(\n",
+        "    reader=paddle.batch(\n",
+        "        paddle.reader.shuffle(\n",
+        "            uci_housing.train(), buf_size=500),\n",
+        "        batch_size=2),\n",
+        "    feeding=feeding,\n",
+        "    event_handler=event_handler,\n",
+        "    num_passes=30)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "## 总结\n",
+        "在这章里，我们借助波士顿房价这一数据集，介绍了线性回归模型的基本概念，以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来，因此弄清楚线性模型的原理和局限非常重要。\n",
+        "\n",
+        "\n",
+        "## 参考文献\n",
+        "1. https://en.wikipedia.org/wiki/Linear_regression\n",
+        "2. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning[M]. Springer, Berlin: Springer series in statistics, 2001.\n",
+        "3. Murphy K P. Machine learning: a probabilistic perspective[M]. MIT press, 2012.\n",
+        "4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.\n",
+        "\n",
+        "\u003cbr/\u003e\n",
+        "\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e\u003cimg alt=\"知识共享许可协议\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /\u003e\u003c/a\u003e\u003cbr /\u003e\u003cspan xmlns:dct=\"http://purl.org/dc/terms/\" href=\"http://purl.org/dc/dcmitype/Text\" property=\"dct:title\" rel=\"dct:type\"\u003e本教程\u003c/span\u003e 由 \u003ca xmlns:cc=\"http://creativecommons.org/ns#\" href=\"http://book.paddlepaddle.org\" property=\"cc:attributionName\" rel=\"cc:attributionURL\"\u003ePaddlePaddle\u003c/a\u003e 创作，采用 \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议\u003c/a\u003e进行许可。\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.6.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/fit_a_line/README.md
+++ b/fit_a_line/README.md
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。

-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -15,8 +15,8 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 ## 效果展示
 我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。
 <p align="center">
-	<img src = "image/predictions.png" width=400><br/>
-	图1. 预测值 V.S. 真实值
+    <img src = "image/predictions.png" width=400><br/>
+    图1. 预测值 V.S. 真实值
 </p>

 ## 模型概览
@@ -96,8 +96,8 @@ import paddle.v2.dataset.uci_housing as uci_housing
 - 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。

 <p align="center">
-	<img src = "image/ranges.png" width=550><br/>
-	图2. 各维属性的取值范围
+    <img src = "image/ranges.png" width=550><br/>
+    图2. 各维属性的取值范围
 </p>

 #### 整理训练集与测试集

--- a/fit_a_line/index.en.html
+++ b/fit_a_line/index.en.html
@@ -43,7 +43,7 @@
 # Linear Regression
 Let us begin the tutorial with a classical problem called Linear Regression \[[1](#References)\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.

-The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Problem Setup
 Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.

--- a/fit_a_line/index.html
+++ b/fit_a_line/index.html
@@ -43,7 +43,7 @@
 # 线性回归
 让我们从经典的线性回归（Linear Regression \[[1](#参考文献)\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。

-本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
@@ -57,8 +57,8 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 ## 效果展示
 我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。
 <p align="center">
-	<img src = "image/predictions.png" width=400><br/>
-	图1. 预测值 V.S. 真实值
+    <img src = "image/predictions.png" width=400><br/>
+    图1. 预测值 V.S. 真实值
 </p>

 ## 模型概览
@@ -138,8 +138,8 @@ import paddle.v2.dataset.uci_housing as uci_housing
 - 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。

 <p align="center">
-	<img src = "image/ranges.png" width=550><br/>
-	图2. 各维属性的取值范围
+    <img src = "image/ranges.png" width=550><br/>
+    图2. 各维属性的取值范围
 </p>

 #### 整理训练集与测试集

--- a/image_classification/README.en.md
+++ b/image_classification/README.en.md
 Image Classification
 =======================

-The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle [Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
+The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle [Installation Tutorial](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst) for installation instructions.

 ## Background


--- a/image_classification/README.md
+++ b/image_classification/README.md
 图像分类
 =======

-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -173,24 +173,24 @@ paddle.init(use_gpu=False, trainer_count=1)

 1. 定义数据输入及其维度

-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。

-	```python
+    ```python
    datadim = 3 * 32 * 32
    classdim = 10

    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(datadim))
-	```
+    ```

 2. 定义VGG网络核心模块

-	```python
-	net = vgg_bn_drop(image)
-	```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+    ```python
+    net = vgg_bn_drop(image)
+    ```
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：

-	```python
+    ```python
    def vgg_bn_drop(input):
        def conv_block(ipt, num_filter, groups, dropouts, num_channels=None):
            return paddle.networks.img_conv_group(
@@ -219,33 +219,33 @@ paddle.init(use_gpu=False, trainer_count=1)
            layer_attr=paddle.attr.Extra(drop_rate=0.5))
        fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear())
        return fc2
-	```
+    ```

-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，

-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。

-	2.3. 最后接两层512维的全连接。
+    2.3. 最后接两层512维的全连接。

 3. 定义分类器

-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。

-	```python
+    ```python
    out = paddle.layer.fc(input=net,
                          size=classdim,
                          act=paddle.activation.Softmax())
-	```
+    ```

 4. 定义损失函数和网络输出

-	在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。

-	```python
+    ```python
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(classdim))
    cost = paddle.layer.classification_cost(input=out, label=lbl)
-	```
+    ```

 ### ResNet


--- a/image_classification/deprecated/README.md
+++ b/image_classification/deprecated/README.md
 图像分类
 =======

-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -244,77 +244,77 @@ $$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$

 1. 定义数据输入及其维度

-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。

-	```python
-	datadim = 3 * 32 * 32
-	classdim = 10
-	data = data_layer(name='image', size=datadim)
-	```
+    ```python
+    datadim = 3 * 32 * 32
+    classdim = 10
+    data = data_layer(name='image', size=datadim)
+    ```

 2. 定义VGG网络核心模块

-	```python
-	net = vgg_bn_drop(data)
-	```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
-
-	```python
-	def vgg_bn_drop(input, num_channels):
-	    def conv_block(ipt, num_filter, groups, dropouts, num_channels_=None):
-	        return img_conv_group(
-	            input=ipt,
-	            num_channels=num_channels_,
-	            pool_size=2,
-	            pool_stride=2,
-	            conv_num_filter=[num_filter] * groups,
-	            conv_filter_size=3,
-	            conv_act=ReluActivation(),
-	            conv_with_batchnorm=True,
-	            conv_batchnorm_drop_rate=dropouts,
-	            pool_type=MaxPooling())
-
-	    conv1 = conv_block(input, 64, 2, [0.3, 0], 3)
-	    conv2 = conv_block(conv1, 128, 2, [0.4, 0])
-	    conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])
-	    conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
-	    conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])
-
-	    drop = dropout_layer(input=conv5, dropout_rate=0.5)
-	    fc1 = fc_layer(input=drop, size=512, act=LinearActivation())
-	    bn = batch_norm_layer(
-	        input=fc1, act=ReluActivation(), layer_attr=ExtraAttr(drop_rate=0.5))
-	    fc2 = fc_layer(input=bn, size=512, act=LinearActivation())
-	    return fc2
-
-	```
-
-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.trainer_config_helpers`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
-
-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
-
-	2.3. 最后接两层512维的全连接。
+    ```python
+    net = vgg_bn_drop(data)
+    ```
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+
+    ```python
+    def vgg_bn_drop(input, num_channels):
+        def conv_block(ipt, num_filter, groups, dropouts, num_channels_=None):
+            return img_conv_group(
+                input=ipt,
+                num_channels=num_channels_,
+                pool_size=2,
+                pool_stride=2,
+                conv_num_filter=[num_filter] * groups,
+                conv_filter_size=3,
+                conv_act=ReluActivation(),
+                conv_with_batchnorm=True,
+                conv_batchnorm_drop_rate=dropouts,
+                pool_type=MaxPooling())
+
+        conv1 = conv_block(input, 64, 2, [0.3, 0], 3)
+        conv2 = conv_block(conv1, 128, 2, [0.4, 0])
+        conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])
+        conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])
+        conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])
+
+        drop = dropout_layer(input=conv5, dropout_rate=0.5)
+        fc1 = fc_layer(input=drop, size=512, act=LinearActivation())
+        bn = batch_norm_layer(
+            input=fc1, act=ReluActivation(), layer_attr=ExtraAttr(drop_rate=0.5))
+        fc2 = fc_layer(input=bn, size=512, act=LinearActivation())
+        return fc2
+
+    ```
+
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.trainer_config_helpers`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+
+    2.3. 最后接两层512维的全连接。

 3. 定义分类器

-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。

-	```python
-	out = fc_layer(input=net, size=class_num, act=SoftmaxActivation())
-	```
+    ```python
+    out = fc_layer(input=net, size=class_num, act=SoftmaxActivation())
+    ```

 4. 定义损失函数和网络输出

-	在有监督训练中需要输入图像对应的类别信息，同样通过`data_layer`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`data_layer`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。

-	```python
-	if not is_predict:
-	    lbl = data_layer(name="label", size=class_num)
-	    cost = classification_cost(input=out, label=lbl)
-	    outputs(cost)
-	else:
-	    outputs(out)
-	```
+    ```python
+    if not is_predict:
+        lbl = data_layer(name="label", size=class_num)
+        cost = classification_cost(input=out, label=lbl)
+        outputs(cost)
+    else:
+        outputs(out)
+    ```

 ### ResNet


--- a/image_classification/index.en.html
+++ b/image_classification/index.en.html
@@ -43,7 +43,7 @@
 Image Classification
 =======================

-The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle [Installation Tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) for installation instructions.
+The source code of this chapter is in [book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification). For the first-time users, please refer to PaddlePaddle [Installation Tutorial](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst) for installation instructions.

 ## Background


--- a/image_classification/index.html
+++ b/image_classification/index.html
@@ -43,7 +43,7 @@
 图像分类
 =======

-本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/image_classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -215,24 +215,24 @@ paddle.init(use_gpu=False, trainer_count=1)

 1. 定义数据输入及其维度

-	网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。
+    网络输入定义为 `data_layer` (数据层)，在图像分类中即为图像像素信息。CIFRAR10是RGB 3通道32x32大小的彩色图，因此输入数据大小为3072(3x32x32)，类别大小为10，即10分类。

-	```python
+    ```python
    datadim = 3 * 32 * 32
    classdim = 10

    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(datadim))
-	```
+    ```

 2. 定义VGG网络核心模块

-	```python
-	net = vgg_bn_drop(image)
-	```
-	VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：
+    ```python
+    net = vgg_bn_drop(image)
+    ```
+    VGG核心模块的输入是数据层，`vgg_bn_drop` 定义了16层VGG结构，每层卷积后面引入BN层和Dropout层，详细的定义如下：

-	```python
+    ```python
    def vgg_bn_drop(input):
        def conv_block(ipt, num_filter, groups, dropouts, num_channels=None):
            return paddle.networks.img_conv_group(
@@ -261,33 +261,33 @@ paddle.init(use_gpu=False, trainer_count=1)
            layer_attr=paddle.attr.Extra(drop_rate=0.5))
        fc2 = paddle.layer.fc(input=bn, size=512, act=paddle.activation.Linear())
        return fc2
-	```
+    ```

-	2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，
+    2.1. 首先定义了一组卷积网络，即conv_block。卷积核大小为3x3，池化窗口大小为2x2，窗口滑动大小为2，groups决定每组VGG模块是几次连续的卷积操作，dropouts指定Dropout操作的概率。所使用的`img_conv_group`是在`paddle.networks`中预定义的模块，由若干组 `Conv->BN->ReLu->Dropout` 和 一组 `Pooling` 组成，

-	2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。
+    2.2. 五组卷积操作，即 5个conv_block。 第一、二组采用两次连续的卷积操作。第三、四、五组采用三次连续的卷积操作。每组最后一个卷积后面Dropout概率为0，即不使用Dropout操作。

-	2.3. 最后接两层512维的全连接。
+    2.3. 最后接两层512维的全连接。

 3. 定义分类器

-	通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。
+    通过上面VGG网络提取高层特征，然后经过全连接层映射到类别维度大小的向量，再通过Softmax归一化得到每个类别的概率，也可称作分类器。

-	```python
+    ```python
    out = paddle.layer.fc(input=net,
                          size=classdim,
                          act=paddle.activation.Softmax())
-	```
+    ```

 4. 定义损失函数和网络输出

-	在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。
+    在有监督训练中需要输入图像对应的类别信息，同样通过`paddle.layer.data`来定义。训练中采用多类交叉熵作为损失函数，并作为网络的输出，预测阶段定义网络的输出为分类器得到的概率信息。

-	```python
+    ```python
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(classdim))
    cost = paddle.layer.classification_cost(input=out, label=lbl)
-	```
+    ```

 ### ResNet


--- a/label_semantic_roles/README.en.md
+++ b/label_semantic_roles/README.en.md
@@ -2,6 +2,8 @@

 Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).

+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
+
 ## Background

 Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.

--- a/label_semantic_roles/README.md
+++ b/label_semantic_roles/README.md
 # 语义角色标注

-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍


--- a/label_semantic_roles/index.en.html
+++ b/label_semantic_roles/index.en.html
@@ -44,6 +44,8 @@

 Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).

+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
+
 ## Background

 Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.

--- a/label_semantic_roles/index.html
+++ b/label_semantic_roles/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 语义角色标注

-本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍


--- a/machine_translation/README.en.md
+++ b/machine_translation/README.en.md
 # Machine Translation

-The source codes is located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle [installation tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) if you are a first time user.
+The source codes is located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle [installation tutorial](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst) if you are a first time user.

 ## Background


--- a/machine_translation/README.md
+++ b/machine_translation/README.md
 # 机器翻译

-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -297,12 +297,12 @@ wmt14_reader = paddle.batch(

 1. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。

-	```python
+    ```python
    decoder_group_name = "decoder_group"
    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
    group_inputs = [group_input1, group_input2]
-	```
+    ```

 1. 训练模式下的解码器调用：

@@ -311,7 +311,7 @@ wmt14_reader = paddle.batch(
   - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
   - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。

-	```python
+    ```python
    trg_embedding = paddle.layer.embedding(
        input=paddle.layer.data(
            name='target_language_word',
@@ -334,7 +334,7 @@ wmt14_reader = paddle.batch(
        name='target_language_next_word',
        type=paddle.data_type.integer_value_sequence(target_dict_dim))
    cost = paddle.layer.classification_cost(input=decoder, label=lbl)
-	```
+    ```

 注意：我们提供的配置在Bahdanau的论文\[[4](#参考文献)\]上做了一些简化，可参考[issue #1133](https://github.com/PaddlePaddle/Paddle/issues/1133)。

@@ -402,7 +402,7 @@ Pass 0, Batch 10, Cost 335.896802, {'classification_error_evaluator': 0.93251532
 .........
 ```

-	当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。
+    当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。

 ## 应用模型


--- a/machine_translation/index.en.html
+++ b/machine_translation/index.en.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # Machine Translation

-The source codes is located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle [installation tutorial](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html) if you are a first time user.
+The source codes is located at [book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation). Please refer to the PaddlePaddle [installation tutorial](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst) if you are a first time user.

 ## Background


--- a/machine_translation/index.html
+++ b/machine_translation/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 机器翻译

-本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/machine_translation](https://github.com/PaddlePaddle/book/tree/develop/machine_translation)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -339,12 +339,12 @@ wmt14_reader = paddle.batch(

 1. 定义解码器框架名字，和`gru_decoder_with_attention`函数的前两个输入。注意：这两个输入使用`StaticInput`，具体说明可见[StaticInput文档](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入)。

-	```python
+    ```python
    decoder_group_name = "decoder_group"
    group_input1 = paddle.layer.StaticInputV2(input=encoded_vector, is_seq=True)
    group_input2 = paddle.layer.StaticInputV2(input=encoded_proj, is_seq=True)
    group_inputs = [group_input1, group_input2]
-	```
+    ```

 1. 训练模式下的解码器调用：

@@ -353,7 +353,7 @@ wmt14_reader = paddle.batch(
   - 接着，使用目标语言的下一个词序列作为标签层lbl，即预测目标词。
   - 最后，用多类交叉熵损失函数`classification_cost`来计算损失值。

-	```python
+    ```python
    trg_embedding = paddle.layer.embedding(
        input=paddle.layer.data(
            name='target_language_word',
@@ -376,7 +376,7 @@ wmt14_reader = paddle.batch(
        name='target_language_next_word',
        type=paddle.data_type.integer_value_sequence(target_dict_dim))
    cost = paddle.layer.classification_cost(input=decoder, label=lbl)
-	```
+    ```

 注意：我们提供的配置在Bahdanau的论文\[[4](#参考文献)\]上做了一些简化，可参考[issue #1133](https://github.com/PaddlePaddle/Paddle/issues/1133)。

@@ -444,7 +444,7 @@ Pass 0, Batch 10, Cost 335.896802, {'classification_error_evaluator': 0.93251532
 .........
 ```

-	当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。
+    当`classification_error_evaluator`的值低于0.35的时候，表示训练成功。

 ## 应用模型


--- a/pre-commit-hooks/convert_markdown_into_ipynb.sh
+++ b/pre-commit-hooks/convert_markdown_into_ipynb.sh
+#!/bin/sh
+for file in $@ ; do
+	/tmp/go/bin/markdown-to-ipynb < $file > ${file%.*}".ipynb"
+    if [ $? -ne 0 ]; then
+        echo >&2 "markdown-to-ipynb $file error"
+        exit 1
+    fi
+done
+
--- a/recognize_digits/README.en.md
+++ b/recognize_digits/README.en.md
 # Recognize Digits

-The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Introduction
 When we learn a new programming language, the first task is usually to write a program that prints "Hello World." In Machine Learning or Deep Learning, the equivalent task is to train a model to perform handwritten digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a 28x28 matrix, and the label is one of the digits from 0 to 9. Each image is normalized in size and centered.

--- a/recognize_digits/README.md
+++ b/recognize_digits/README.md
 # 识别数字

-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。

--- a/recognize_digits/index.en.html
+++ b/recognize_digits/index.en.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # Recognize Digits

-The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source code for this tutorial is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits). First-time readers, please refer to PaddlePaddle [installation instructions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Introduction
 When we learn a new programming language, the first task is usually to write a program that prints "Hello World." In Machine Learning or Deep Learning, the equivalent task is to train a model to perform handwritten digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a 28x28 matrix, and the label is one of the digits from 0 to 9. Each image is normalized in size and centered.

--- a/recognize_digits/index.html
+++ b/recognize_digits/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 识别数字

-本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 当我们学习编程的时候，编写的第一个程序一般是实现打印"Hello World"。而机器学习（或深度学习）的入门教程，一般都是 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据库上的手写识别问题。原因是手写识别属于典型的图像分类问题，比较简单，同时MNIST数据集也很完备。MNIST数据集作为一个简单的计算机视觉数据集，包含一系列如图1所示的手写数字图片和对应的标签。图片是28x28的像素矩阵，标签则对应着0~9的10个数字。每张图片都经过了大小归一化和居中处理。

--- a/recommender_system/README.en.ipynb
+++ b/recommender_system/README.en.ipynb
--- a/recommender_system/README.en.md
+++ b/recommender_system/README.en.md
@@ -2,6 +2,9 @@

 The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system).

+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
+
+
 ## Background

 With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.

--- a/recommender_system/README.ipynb
+++ b/recommender_system/README.ipynb
--- a/recommender_system/README.md
+++ b/recommender_system/README.md
 # 个性化推荐

-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍


--- a/recommender_system/index.en.html
+++ b/recommender_system/index.en.html
@@ -44,6 +44,9 @@

 The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system).

+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
+
+
 ## Background

 With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.

--- a/recommender_system/index.html
+++ b/recommender_system/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 个性化推荐

-本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍


--- a/tools/convert-markdown-into-ipynb-and-test.sh
+++ b/tools/convert-markdown-into-ipynb-and-test.sh
+#!/bin/sh
+command -v go >/dev/null 2>&1
+if [ $? -ne 0 ]; then
+    echo >&2 "Please install go https://golang.org/doc/install#install"
+    exit 1
+fi
+
+GOPATH=/tmp/go go get -u github.com/wangkuiyi/ipynb/markdown-to-ipynb
+
+cur_path=$(dirname $(readlink -f $0))
+cd $cur_path/../
+
+#convert md to ipynb
+for file in */{README,README\.en}.md ; do
+    /tmp/go/bin/markdown-to-ipynb < $file > ${file%.*}".ipynb"
+    if [ $? -ne 0 ]; then
+        echo >&2 "markdown-to-ipynb $file error"
+        exit 1
+    fi
+done
+
+if [[ -z $TEST_EMBEDDED_PYTHON_SCRIPTS ]]; then
+    exit 0
+fi
+
+#exec ipynb's py file
+for file in */{README,README\.en}.ipynb ; do
+    pushd $PWD > /dev/null
+    cd $(dirname $file) > /dev/null
+
+    echo "begin test $file"
+    jupyter nbconvert --to python $(basename $file) --stdout | python
+
+    popd > /dev/null
+    #break
+done
--- a/understand_sentiment/README.en.md
+++ b/understand_sentiment/README.en.md
 # Sentiment Analysis

-The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Background


--- a/understand_sentiment/README.md
+++ b/understand_sentiment/README.md
 # 情感分析

-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 在自然语言处理中，情感分析一般是指判断一段文本所表达的情绪状态。其中，一段文本可以是一个句子，一个段落或一个文档。情绪状态可以是两类，如（正面，负面），（高兴，悲伤）；也可以是三类，如（积极，消极，中性）等等。情感分析的应用场景十分广泛，如把用户在购物网站（亚马逊、天猫、淘宝等）、旅游网站、电影评论网站上发表的评论分成正面评论和负面评论；或为了分析用户对于某一产品的整体使用感受，抓取产品的用户评论并进行情感分析等等。表格1展示了对电影评论进行情感分析的例子：

--- a/understand_sentiment/index.en.html
+++ b/understand_sentiment/index.en.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # Sentiment Analysis

-The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Background


--- a/understand_sentiment/index.html
+++ b/understand_sentiment/index.html
@@ -42,7 +42,7 @@
 <div id="markdown" style='display:none'>
 # 情感分析

-本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍
 在自然语言处理中，情感分析一般是指判断一段文本所表达的情绪状态。其中，一段文本可以是一个句子，一个段落或一个文档。情绪状态可以是两类，如（正面，负面），（高兴，悲伤）；也可以是三类，如（积极，消极，中性）等等。情感分析的应用场景十分广泛，如把用户在购物网站（亚马逊、天猫、淘宝等）、旅游网站、电影评论网站上发表的评论分成正面评论和负面评论；或为了分析用户对于某一产品的整体使用感受，抓取产品的用户评论并进行情感分析等等。表格1展示了对电影评论进行情感分析的例子：

--- a/word2vec/README.en.md
+++ b/word2vec/README.en.md
@@ -2,7 +2,7 @@

 This is intended as a reference tutorial. The source code of this tutorial lives on [book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec).

-For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Background Introduction


--- a/word2vec/README.md
+++ b/word2vec/README.md

 # 词向量

-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -32,8 +32,8 @@ $$X = USV^T$$
 本章中，当词向量训练好后，我们可以用数据可视化算法t-SNE\[[4](#参考文献)\]画出词语特征在二维上的投影（如下图所示）。从图中可以看出，语义相关的词语（如a, the, these; big, huge）在投影上距离很近，语意无关的词（如say, business; decision, japan）在投影上的距离很远。

 <p align="center">
-	<img src = "image/2d_similarity.png" width=400><br/>
-	图1. 词向量的二维投影
+    <img src = "image/2d_similarity.png" width=400><br/>
+    图1. 词向量的二维投影
 </p>

 另一方面，我们知道两个向量的余弦值在$[-1,1]$的区间内：两个完全相同的向量余弦值为1, 两个相互垂直的向量之间余弦值为0，两个方向完全相反的向量余弦值为-1，即相关性和余弦值大小成正比。因此我们还可以计算两个词向量的余弦相似度:
@@ -86,8 +86,8 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$
 其中$f(w_t, w_{t-1}, ..., w_{t-n+1})$表示根据历史n-1个词得到当前词$w_t$的条件概率，$R(\theta)$表示参数正则项。

 <p align="center">
-   	<img src="image/nnlm.png" width=500><br/>
-   	图2. N-gram神经网络模型
+       <img src="image/nnlm.png" width=500><br/>
+       图2. N-gram神经网络模型
 </p>

 图2展示了N-gram神经网络模型，从下往上看，该模型分为以下几个部分：
@@ -97,7 +97,7 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$

 - 然后所有词语的词向量连接成一个大向量，并经过一个非线性映射得到历史词语的隐层表示：

-	$$g=Utanh(\theta^Tx + b_1) + Wx + b_2$$
+    $$g=Utanh(\theta^Tx + b_1) + Wx + b_2$$

    其中，$x$为所有词语的词向量连接成的大向量，表示文本历史特征；$\theta$、$U$、$b_1$、$b_2$和$W$分别为词向量层到隐层连接的参数。$g$表示未经归一化的所有输出单词概率，$g_i$表示未经归一化的字典中第$i$个单词的输出概率。

@@ -118,8 +118,8 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$
 CBOW模型通过一个词的上下文（各N个词）预测当前词。当N=2时，模型如下图所示：

 <p align="center">
-	<img src="image/cbow.png" width=250><br/>
-	图3. CBOW模型
+    <img src="image/cbow.png" width=250><br/>
+    图3. CBOW模型
 </p>

 具体来说，不考虑上下文的词语输入顺序，CBOW是用上下文词语的词向量的均值来预测当前词。即：
@@ -133,8 +133,8 @@ $$context = \frac{x_{t-1} + x_{t-2} + x_{t+1} + x_{t+2}}{4}$$
 CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去掉了噪声，因此在小数据集上很有效。而Skip-gram的方法中，用一个词预测其上下文，得到了当前词上下文的很多样本，因此可用于更大的数据集。

 <p align="center">
-	<img src="image/skipgram.png" width=250><br/>
-	图4. Skip-gram模型
+    <img src="image/skipgram.png" width=250><br/>
+    图4. Skip-gram模型
 </p>

 如上图所示，Skip-gram模型的具体做法是，将一个词的词向量映射到$2n$个词的词向量（$2n$表示当前输入词的前后各$n$个词），然后分别通过softmax得到这$2n$个词的分类损失值之和。
@@ -148,21 +148,21 @@ CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去

 <p align="center">
 <table>
-	<tr>
-		<td>训练数据</td>
-		<td>验证数据</td>
-		<td>测试数据</td>
-	</tr>
-	<tr>
-		<td>ptb.train.txt</td>
-		<td>ptb.valid.txt</td>
-		<td>ptb.test.txt</td>
-	</tr>
-	<tr>
-		<td>42068句</td>
-		<td>3370句</td>
-		<td>3761句</td>
-	</tr>
+    <tr>
+        <td>训练数据</td>
+        <td>验证数据</td>
+        <td>测试数据</td>
+    </tr>
+    <tr>
+        <td>ptb.train.txt</td>
+        <td>ptb.valid.txt</td>
+        <td>ptb.test.txt</td>
+    </tr>
+    <tr>
+        <td>42068句</td>
+        <td>3370句</td>
+        <td>3761句</td>
+    </tr>
 </table>
 </p>

@@ -189,8 +189,8 @@ dream that one day <e>
 本配置的模型结构如下图所示：

 <p align="center">
-	<img src="image/ngram.png" width=400><br/>
-	图5. 模型配置中的N-gram神经网络模型
+    <img src="image/ngram.png" width=400><br/>
+    图5. 模型配置中的N-gram神经网络模型
 </p>

 首先，加载所需要的包：

--- a/word2vec/index.en.html
+++ b/word2vec/index.en.html
@@ -44,7 +44,7 @@

 This is intended as a reference tutorial. The source code of this tutorial lives on [book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec).

-For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).
+For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).

 ## Background Introduction


--- a/word2vec/index.html
+++ b/word2vec/index.html
@@ -43,7 +43,7 @@

 # 词向量

-本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
+本教程源代码目录在[book/word2vec](https://github.com/PaddlePaddle/book/tree/develop/word2vec)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。

 ## 背景介绍

@@ -74,8 +74,8 @@ $$X = USV^T$$
 本章中，当词向量训练好后，我们可以用数据可视化算法t-SNE\[[4](#参考文献)\]画出词语特征在二维上的投影（如下图所示）。从图中可以看出，语义相关的词语（如a, the, these; big, huge）在投影上距离很近，语意无关的词（如say, business; decision, japan）在投影上的距离很远。

 <p align="center">
-	<img src = "image/2d_similarity.png" width=400><br/>
-	图1. 词向量的二维投影
+    <img src = "image/2d_similarity.png" width=400><br/>
+    图1. 词向量的二维投影
 </p>

 另一方面，我们知道两个向量的余弦值在$[-1,1]$的区间内：两个完全相同的向量余弦值为1, 两个相互垂直的向量之间余弦值为0，两个方向完全相反的向量余弦值为-1，即相关性和余弦值大小成正比。因此我们还可以计算两个词向量的余弦相似度:
@@ -128,8 +128,8 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$
 其中$f(w_t, w_{t-1}, ..., w_{t-n+1})$表示根据历史n-1个词得到当前词$w_t$的条件概率，$R(\theta)$表示参数正则项。

 <p align="center">
-   	<img src="image/nnlm.png" width=500><br/>
-   	图2. N-gram神经网络模型
+       <img src="image/nnlm.png" width=500><br/>
+       图2. N-gram神经网络模型
 </p>

 图2展示了N-gram神经网络模型，从下往上看，该模型分为以下几个部分：
@@ -139,7 +139,7 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$

 - 然后所有词语的词向量连接成一个大向量，并经过一个非线性映射得到历史词语的隐层表示：

-	$$g=Utanh(\theta^Tx + b_1) + Wx + b_2$$
+    $$g=Utanh(\theta^Tx + b_1) + Wx + b_2$$

    其中，$x$为所有词语的词向量连接成的大向量，表示文本历史特征；$\theta$、$U$、$b_1$、$b_2$和$W$分别为词向量层到隐层连接的参数。$g$表示未经归一化的所有输出单词概率，$g_i$表示未经归一化的字典中第$i$个单词的输出概率。

@@ -160,8 +160,8 @@ $$\frac{1}{T}\sum_t f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta)$$
 CBOW模型通过一个词的上下文（各N个词）预测当前词。当N=2时，模型如下图所示：

 <p align="center">
-	<img src="image/cbow.png" width=250><br/>
-	图3. CBOW模型
+    <img src="image/cbow.png" width=250><br/>
+    图3. CBOW模型
 </p>

 具体来说，不考虑上下文的词语输入顺序，CBOW是用上下文词语的词向量的均值来预测当前词。即：
@@ -175,8 +175,8 @@ $$context = \frac{x_{t-1} + x_{t-2} + x_{t+1} + x_{t+2}}{4}$$
 CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去掉了噪声，因此在小数据集上很有效。而Skip-gram的方法中，用一个词预测其上下文，得到了当前词上下文的很多样本，因此可用于更大的数据集。

 <p align="center">
-	<img src="image/skipgram.png" width=250><br/>
-	图4. Skip-gram模型
+    <img src="image/skipgram.png" width=250><br/>
+    图4. Skip-gram模型
 </p>

 如上图所示，Skip-gram模型的具体做法是，将一个词的词向量映射到$2n$个词的词向量（$2n$表示当前输入词的前后各$n$个词），然后分别通过softmax得到这$2n$个词的分类损失值之和。
@@ -190,21 +190,21 @@ CBOW的好处是对上下文词语的分布在词向量上进行了平滑，去

 <p align="center">
 <table>
-	<tr>
-		<td>训练数据</td>
-		<td>验证数据</td>
-		<td>测试数据</td>
-	</tr>
-	<tr>
-		<td>ptb.train.txt</td>
-		<td>ptb.valid.txt</td>
-		<td>ptb.test.txt</td>
-	</tr>
-	<tr>
-		<td>42068句</td>
-		<td>3370句</td>
-		<td>3761句</td>
-	</tr>
+    <tr>
+        <td>训练数据</td>
+        <td>验证数据</td>
+        <td>测试数据</td>
+    </tr>
+    <tr>
+        <td>ptb.train.txt</td>
+        <td>ptb.valid.txt</td>
+        <td>ptb.test.txt</td>
+    </tr>
+    <tr>
+        <td>42068句</td>
+        <td>3370句</td>
+        <td>3761句</td>
+    </tr>
 </table>
 </p>

@@ -231,8 +231,8 @@ dream that one day <e>
 本配置的模型结构如下图所示：

 <p align="center">
-	<img src="image/ngram.png" width=400><br/>
-	图5. 模型配置中的N-gram神经网络模型
+    <img src="image/ngram.png" width=400><br/>
+    图5. 模型配置中的N-gram神经网络模型
 </p>

 首先，加载所需要的包：