diff --git a/recognize_digits/README.en.md b/recognize_digits/README.en.md
index adb49b76cc8e9f612f7e7588829b4668a2706041..9bc4a2ba3e22af97eb0a9aecc87ca244850d4d25 100644
--- a/recognize_digits/README.en.md
+++ b/recognize_digits/README.en.md
@@ -32,15 +32,15 @@ In a simple softmax regression model, the input is fed to fully connected layers
 
 Input $X$ is multiplied with weights $W$, and bias $b$ is added to generate activations.
 
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
 
-where $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+where $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 
 For an $N$ class classification problem with $N$ output nodes, an $N$ dimensional vector is normalized to $N$ real values in the range [0, 1], each representing the probability of the sample to belong to the class. Here $y_i$ is the prediction probability that an image is digit $i$.
 
 In such a classification problem, we usually use the cross entropy loss function:
 
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 
 Fig. 2 shows a softmax regression network, with weights in blue, and bias in red. +1 indicates bias is 1.
 
@@ -55,7 +55,7 @@ The Softmax regression model described above uses the simplest two-layer neural
 
 1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ is the activation function. Some common ones are sigmoid, tanh and ReLU.
 2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
-3.  Finally, after output layer, we get $Y=softmax(W_3H_2 + b_3)$, the final classification result vector.
+3.  Finally, after output layer, we get $Y=\text{softmax}(W_3H_2 + b_3)$, the final classification result vector.
 
 Fig. 3. is Multilayer Perceptron network, with weights in blue, and bias in red. +1 indicates bias is 1.
 
diff --git a/recognize_digits/README.md b/recognize_digits/README.md
index b3639b769e34c15b43cc4f979d0b024df09d1eb7..e0472b68dadd15ddea4d07b5a380918af711e331 100644
--- a/recognize_digits/README.md
+++ b/recognize_digits/README.md
@@ -32,15 +32,15 @@ Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程
 
 输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
 
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
 
-其中 $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 
 对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维输入特征经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
 
 在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
 
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 
 图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
 
@@ -55,7 +55,7 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 
 1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
 2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
-3.  最后，再经过输出层，得到的$Y=softmax(W_3H_2 + b_3)$，即为最后的分类结果向量。
+3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
 
 
 图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
diff --git a/recognize_digits/index.en.html b/recognize_digits/index.en.html
index 117d39fcc775d48bbd23e0a8b33956213b9d7f00..e7fcf9de884b1ad981082426a169e2e706e0cb5f 100644
--- a/recognize_digits/index.en.html
+++ b/recognize_digits/index.en.html
@@ -74,15 +74,15 @@ In a simple softmax regression model, the input is fed to fully connected layers
 
 Input $X$ is multiplied with weights $W$, and bias $b$ is added to generate activations.
 
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
 
-where $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+where $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 
 For an $N$ class classification problem with $N$ output nodes, an $N$ dimensional vector is normalized to $N$ real values in the range [0, 1], each representing the probability of the sample to belong to the class. Here $y_i$ is the prediction probability that an image is digit $i$.
 
 In such a classification problem, we usually use the cross entropy loss function:
 
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 
 Fig. 2 shows a softmax regression network, with weights in blue, and bias in red. +1 indicates bias is 1.
 
@@ -97,7 +97,7 @@ The Softmax regression model described above uses the simplest two-layer neural
 
 1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ is the activation function. Some common ones are sigmoid, tanh and ReLU.
 2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
-3.  Finally, after output layer, we get $Y=softmax(W_3H_2 + b_3)$, the final classification result vector.
+3.  Finally, after output layer, we get $Y=\text{softmax}(W_3H_2 + b_3)$, the final classification result vector.
 
 Fig. 3. is Multilayer Perceptron network, with weights in blue, and bias in red. +1 indicates bias is 1.
 
diff --git a/recognize_digits/index.html b/recognize_digits/index.html
index 87dd91ecb39e544d843889016e1a0b1530197250..8260711a1f97ed842e920ce1f999262f06aae97a 100644
--- a/recognize_digits/index.html
+++ b/recognize_digits/index.html
@@ -74,15 +74,15 @@ Yann LeCun早先在手写字符识别上做了很多研究，并在研究过程
 
 输入层的数据$X$传到输出层，在激活操作之前，会乘以相应的权重 $W$ ，并加上偏置变量 $b$ ，具体如下：
 
-$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$
+$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
 
-其中 $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
+其中 $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
 
 对于有 $N$ 个类别的多分类问题，指定 $N$ 个输出节点，$N$ 维输入特征经过softmax将归一化为 $N$ 个[0,1]范围内的实数值，分别表示该样本属于这 $N$ 个类别的概率。此处的 $y_i$ 即对应该图片为数字 $i$ 的预测概率。
 
 在分类问题中，我们一般采用交叉熵代价损失函数（cross entropy），公式如下：
 
-$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$
+$$  \text{crossentropy}(label, y) = -\sum_i label_ilog(y_i) $$
 
 图2为softmax回归的网络图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
 
@@ -97,7 +97,7 @@ Softmax回归模型采用了最简单的两层神经网络，即只有输入层
 
 1.  经过第一个隐藏层，可以得到 $ H_1 = \phi(W_1X + b_1) $，其中$\phi$代表激活函数，常见的有sigmoid、tanh或ReLU等函数。
 2.  经过第二个隐藏层，可以得到 $ H_2 = \phi(W_2H_1 + b_2) $。
-3.  最后，再经过输出层，得到的$Y=softmax(W_3H_2 + b_3)$，即为最后的分类结果向量。
+3.  最后，再经过输出层，得到的$Y=\text{softmax}(W_3H_2 + b_3)$，即为最后的分类结果向量。
 
 
 图3为多层感知器的网络结构图，图中权重用蓝线表示、偏置用红线表示、+1代表偏置参数的系数为1。
diff --git a/understand_sentiment/README.md b/understand_sentiment/README.md
index 1dfa46d1a90b61c79d5500b3d90c9baf3ab30df8..3e437f2d5f68e3cb163b0864f92d251ae6b603f3 100644
--- a/understand_sentiment/README.md
+++ b/understand_sentiment/README.md
@@ -231,9 +231,9 @@ if __name__ == '__main__':
 ```
 这里，`dataset.imdb.train()`和`dataset.imdb.test()`分别是`dataset.imdb`中的训练数据和测试数据API。`train_reader`在训练时使用，意义是将读取的训练数据进行shuffle后，组成一个batch数据。同理，`test_reader`是在测试的时候使用，将读取的测试数据组成一个batch。
 ```
-    reader_dict={'word': 0, 'label': 1}
+    feeding={'word': 0, 'label': 1}
 ```
-`reader_dict`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
+`feeding`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
 ### 构造模型
 ```
     # Please choose the way to build the network
@@ -270,7 +270,7 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
                 sys.stdout.write('.')
                 sys.stdout.flush()
         if isinstance(event, paddle.event.EndPass):
-            result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+            result = trainer.test(reader=test_reader, feeding=feeding)
             print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```
 可以通过给train函数传递一个`event_handler`来获取每个batch和每个pass结束的状态。比如构造如下一个`event_handler`可以在每100个batch结束后输出cost和error；在每个pass结束后调用`trainer.test`计算一遍测试集并获得当前模型在测试集上的error。
@@ -283,7 +283,7 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
     trainer.train(
         reader=train_reader,
         event_handler=event_handler,
-        reader_dict=reader_dict,
+        feeding=feeding,
         num_passes=2)
 ```
 程序运行之后的输出如下。
diff --git a/understand_sentiment/index.html b/understand_sentiment/index.html
index ca8e8180f27b993592d6602cea0cf203b9afc393..ced0f2586873f7316df30710a81db8da6285d10e 100644
--- a/understand_sentiment/index.html
+++ b/understand_sentiment/index.html
@@ -273,9 +273,9 @@ if __name__ == '__main__':
 ```
 这里，`dataset.imdb.train()`和`dataset.imdb.test()`分别是`dataset.imdb`中的训练数据和测试数据API。`train_reader`在训练时使用，意义是将读取的训练数据进行shuffle后，组成一个batch数据。同理，`test_reader`是在测试的时候使用，将读取的测试数据组成一个batch。
 ```
-    reader_dict={'word': 0, 'label': 1}
+    feeding={'word': 0, 'label': 1}
 ```
-`reader_dict`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
+`feeding`用来指定`train_reader`和`test_reader`返回的数据与模型配置中data_layer的对应关系。这里表示reader返回的第0列数据对应`word`层，第1列数据对应`label`层。
 ### 构造模型
 ```
     # Please choose the way to build the network
@@ -312,7 +312,7 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
                 sys.stdout.write('.')
                 sys.stdout.flush()
         if isinstance(event, paddle.event.EndPass):
-            result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+            result = trainer.test(reader=test_reader, feeding=feeding)
             print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 ```
 可以通过给train函数传递一个`event_handler`来获取每个batch和每个pass结束的状态。比如构造如下一个`event_handler`可以在每100个batch结束后输出cost和error；在每个pass结束后调用`trainer.test`计算一遍测试集并获得当前模型在测试集上的error。
@@ -325,7 +325,7 @@ Paddle中提供了一系列优化算法的API，这里使用Adam优化算法。
     trainer.train(
         reader=train_reader,
         event_handler=event_handler,
-        reader_dict=reader_dict,
+        feeding=feeding,
         num_passes=2)
 ```
 程序运行之后的输出如下。
diff --git a/understand_sentiment/train.py b/understand_sentiment/train.py
index 81ff4a0b214b29cde4623260360e362b776aaa24..1c856556bd0cb32f60eba322469b3621c37e1349 100644
--- a/understand_sentiment/train.py
+++ b/understand_sentiment/train.py
@@ -117,7 +117,7 @@ if __name__ == '__main__':
     test_reader = paddle.batch(
         lambda: paddle.dataset.imdb.test(word_dict), batch_size=100)
 
-    reader_dict = {'word': 0, 'label': 1}
+    feeding = {'word': 0, 'label': 1}
 
     # network config
     # Please choose the way to build the network
@@ -144,7 +144,7 @@ if __name__ == '__main__':
                 sys.stdout.write('.')
                 sys.stdout.flush()
         if isinstance(event, paddle.event.EndPass):
-            result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+            result = trainer.test(reader=test_reader, feeding=feeding)
             print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
 
     # create trainer
@@ -155,5 +155,5 @@ if __name__ == '__main__':
     trainer.train(
         reader=train_reader,
         event_handler=event_handler,
-        reader_dict=reader_dict,
+        feeding=feeding,
         num_passes=2)