diff --git a/01.fit_a_line/README.cn.md b/01.fit_a_line/README.cn.md
index e5a884f9f4cbb47bf7fa0aff2d439399da66c885..12c52bae019145a8c652797dca70d6f7c1f8b673 100644
--- a/01.fit_a_line/README.cn.md
+++ b/01.fit_a_line/README.cn.md
@@ -3,10 +3,21 @@
 
 本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请您参考[Book文档使用说明](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
 
+### 说明：###
+1.硬件环境要求：
+本文可支持在CPU、GPU下运行
+2. Docker镜像支持的CUDA/cuDNN版本：
+如果使用了Docker运行Book，请注意：这里所提供的默认镜像的GPU环境为 CUDA 8/cuDNN 5，对于NVIDIA Tesla V100等要求CUDA 9的 GPU，使用该镜像可能会运行失败。
+3. 文档和脚本中代码的一致性问题：
+请注意：为使本文更加易读易用，我们拆分、调整了train.py的代码并放入本文。本文中代码与train.py的运行结果一致，可直接运行[train.py](https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/train.py)进行验证。
+
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
 
-$$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldots,n$$
+
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_1.png?raw=true" width=550><br/>
+</p>
 
 例如，在我们将要建模的房价预测问题里，$x_{ij}$是描述房子$i$的各种属性（比如房间的个数、周围学校和医院的个数、交通状况等），而 $y_i$是房屋的价格。
 
@@ -25,7 +36,9 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 
 在波士顿房价数据集中，和房屋相关的值共有14个：前13个用来描述房屋相关的各种信息，即模型中的 $x_i$；最后一个值为我们要预测的该类房屋价格的中位数，即模型中的 $y_i$。因此，我们的模型就可以表示成：
 
-$$\hat{Y} = \omega_1X_{1} + \omega_2X_{2} + \ldots + \omega_{13}X_{13} + b$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_2.png?raw=true" width=350><br/>
+</p>
 
 $\hat{Y}$ 表示模型的预测结果，用来和真实值$Y$区分。模型要学习的参数即：$\omega_1, \ldots, \omega_{13}, b$。
 
@@ -33,13 +46,17 @@ $\hat{Y}$ 表示模型的预测结果，用来和真实值$Y$区分。模型要
 
 对于线性回归模型来讲，最常见的损失函数就是均方误差（Mean Squared Error， [MSE](https://en.wikipedia.org/wiki/Mean_squared_error)）了，它的形式是：
 
-$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(\hat{Y_i}-Y_i)}^2$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_3.png?raw=true" width=200><br/>
+</p>
 
 即对于一个大小为$n$的测试集，$MSE$是$n$个数据预测结果误差平方的均值。
 
 对损失函数进行优化所采用的方法一般为梯度下降法。梯度下降法是一种一阶最优化算法。如果$f(x)$在点$x_n$有定义且可微，则认为$f(x)$在点$x_n$沿着梯度的负方向$-▽f(x_n)$下降的是最快的。反复调节$x$，使得$f(x)$接近最小值或者极小值，调节的方式为：
 
-$$x_n+1=x_n-λ▽f(x), n≧0$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_4.png?raw=true" width=250><br/>
+</p>
 
 其中λ代表学习率。这种调节的方法称为梯度下降法。
 
@@ -355,7 +372,7 @@ with fluid.scope_guard(inference_scope):
 
     save_result(results[0], infer_label) # 保存图片
 ```
-
+由于每次都是随机选择一个minibatch的数据作为当前迭代的训练数据，所以每次得到的预测结果会有所不同。
 
 
 ## 总结
@@ -369,4 +386,4 @@ with fluid.scope_guard(inference_scope):
 4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
 
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://www.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
diff --git a/01.fit_a_line/image/formula_fit_a_line_1.png b/01.fit_a_line/image/formula_fit_a_line_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..6f43fb6a7ec96aaf1fc8cf6b6f239561c0a677b2
Binary files /dev/null and b/01.fit_a_line/image/formula_fit_a_line_1.png differ
diff --git a/01.fit_a_line/image/formula_fit_a_line_2.png b/01.fit_a_line/image/formula_fit_a_line_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..a665fc60aeaa6e1a5c328f06b07f39e4f2af78c8
Binary files /dev/null and b/01.fit_a_line/image/formula_fit_a_line_2.png differ
diff --git a/01.fit_a_line/image/formula_fit_a_line_3.png b/01.fit_a_line/image/formula_fit_a_line_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..97a242ca5065128969d0d0d749a053d5640270ff
Binary files /dev/null and b/01.fit_a_line/image/formula_fit_a_line_3.png differ
diff --git a/01.fit_a_line/image/formula_fit_a_line_4.png b/01.fit_a_line/image/formula_fit_a_line_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..6979b6cda7c40a68204f7e963d6cdf0dbba01dd2
Binary files /dev/null and b/01.fit_a_line/image/formula_fit_a_line_4.png differ
diff --git a/01.fit_a_line/index.cn.html b/01.fit_a_line/index.cn.html
index 61c970c535189a3681cae7e348928ccf286824b9..6e6ea06c2f5bca023ade70329ce9f79fc49053b3 100644
--- a/01.fit_a_line/index.cn.html
+++ b/01.fit_a_line/index.cn.html
@@ -45,10 +45,21 @@
 
 本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line)， 初次使用请您参考[Book文档使用说明](https://github.com/PaddlePaddle/book/blob/develop/README.cn.md#运行这本书)。
 
+### 说明：###
+1.硬件环境要求：
+本文可支持在CPU、GPU下运行
+2. Docker镜像支持的CUDA/cuDNN版本：
+如果使用了Docker运行Book，请注意：这里所提供的默认镜像的GPU环境为 CUDA 8/cuDNN 5，对于NVIDIA Tesla V100等要求CUDA 9的 GPU，使用该镜像可能会运行失败。
+3. 文档和脚本中代码的一致性问题：
+请注意：为使本文更加易读易用，我们拆分、调整了train.py的代码并放入本文。本文中代码与train.py的运行结果一致，可直接运行[train.py](https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/train.py)进行验证。
+
 ## 背景介绍
 给定一个大小为$n$的数据集  ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$，其中$x_{i1}, \ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即
 
-$$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldots,n$$
+
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_1.png?raw=true" width=550><br/>
+</p>
 
 例如，在我们将要建模的房价预测问题里，$x_{ij}$是描述房子$i$的各种属性（比如房间的个数、周围学校和医院的个数、交通状况等），而 $y_i$是房屋的价格。
 
@@ -67,7 +78,9 @@ $$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldo
 
 在波士顿房价数据集中，和房屋相关的值共有14个：前13个用来描述房屋相关的各种信息，即模型中的 $x_i$；最后一个值为我们要预测的该类房屋价格的中位数，即模型中的 $y_i$。因此，我们的模型就可以表示成：
 
-$$\hat{Y} = \omega_1X_{1} + \omega_2X_{2} + \ldots + \omega_{13}X_{13} + b$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_2.png?raw=true" width=350><br/>
+</p>
 
 $\hat{Y}$ 表示模型的预测结果，用来和真实值$Y$区分。模型要学习的参数即：$\omega_1, \ldots, \omega_{13}, b$。
 
@@ -75,13 +88,17 @@ $\hat{Y}$ 表示模型的预测结果，用来和真实值$Y$区分。模型要
 
 对于线性回归模型来讲，最常见的损失函数就是均方误差（Mean Squared Error， [MSE](https://en.wikipedia.org/wiki/Mean_squared_error)）了，它的形式是：
 
-$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(\hat{Y_i}-Y_i)}^2$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_3.png?raw=true" width=200><br/>
+</p>
 
 即对于一个大小为$n$的测试集，$MSE$是$n$个数据预测结果误差平方的均值。
 
 对损失函数进行优化所采用的方法一般为梯度下降法。梯度下降法是一种一阶最优化算法。如果$f(x)$在点$x_n$有定义且可微，则认为$f(x)$在点$x_n$沿着梯度的负方向$-▽f(x_n)$下降的是最快的。反复调节$x$，使得$f(x)$接近最小值或者极小值，调节的方式为：
 
-$$x_n+1=x_n-λ▽f(x), n≧0$$
+<p align="center">
+    <img src = "https://github.com/ceci3/book/blob/update_fit_a_line/01.fit_a_line/image/formula_fit_a_line_4.png?raw=true" width=250><br/>
+</p>
 
 其中λ代表学习率。这种调节的方法称为梯度下降法。
 
@@ -397,7 +414,7 @@ with fluid.scope_guard(inference_scope):
 
     save_result(results[0], infer_label) # 保存图片
 ```
-
+由于每次都是随机选择一个minibatch的数据作为当前迭代的训练数据，所以每次得到的预测结果会有所不同。
 
 
 ## 总结
@@ -411,7 +428,7 @@ with fluid.scope_guard(inference_scope):
 4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
 
 <br/>
-<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://www.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
 
 </div>
 <!-- You can change the lines below now. -->