diff --git a/02.recognize_digits/README.cn.md b/02.recognize_digits/README.cn.md
index bf7b5ce4246669d151bb1e0b45e608cf1a6f3f7e..110ee1219903f867199ee4115583db2c7b7aca7d 100644
--- a/02.recognize_digits/README.cn.md
+++ b/02.recognize_digits/README.cn.md
@@ -139,7 +139,7 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 1. `train_program`：指定如何从 `inference_program` 和`标签值`中获取 `loss` 的函数。
 这是指定损失计算的地方。
 
-1. `optimizer`: 配置如何最小化损失。PaddlePaddle 支持最主要的优化方法。
+1. `optimizer_func`: “指定优化器配置的函数。优化器负责减少损失并驱动培训。Paddle 支持多种不同的优化器。
 
 1. `Trainer`：PaddlePaddle Trainer 管理由 `train_program` 和 `optimizer` 指定的训练过程。
 通过 `event_handler` 回调函数，用户可以监控培训的进展。
@@ -238,6 +238,15 @@ def train_program():
 # 该模型运行在单个CPU上
 ```
 
+#### Optimizer Function 配置
+
+在下面的 `Adam optimizer`，`learning_rate` 是训练的速度，与网络的训练收敛速度有关系。
+
+```python
+def optimizer_program():
+    return fluid.optimizer.Adam(learning_rate=0.001)
+```
+
 ### 数据集 Feeders 配置
 
 下一步，我们开始训练过程。`paddle.dataset.movielens.train()`和`paddle.dataset.movielens.test()`分别做训练和测试数据集。这两个函数各自返回一个reader——PaddlePaddle中的reader是一个Python函数，每次调用的时候返回一个Python yield generator。
@@ -259,16 +268,14 @@ test_reader = paddle.batch(
 ### Trainer 配置
 
 现在，我们需要配置 `Trainer`。`Trainer` 需要接受训练程序 `train_program`, `place` 和优化器 `optimizer`。
-在下面的 `Adam optimizer`，`learning_rate` 是训练的速度，与网络的训练收敛速度有关系。
 
 ```python
 # 该模型运行在单个CPU上
 use_cuda = False # set to True if training with GPU
 place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-optimizer = fluid.optimizer.Adam(learning_rate=0.001)
 
 trainer = fluid.Trainer(
-    train_func=train_program, place=place, optimizer=optimizer)
+    train_func=train_program, place=place, optimizer_func=optimizer_program)
  ```
 
 #### Event Handler 配置
diff --git a/02.recognize_digits/README.md b/02.recognize_digits/README.md
index 66e526bded114f84adfa3ad8d1beeea982ac333e..7c6792ff578b3c1626361a0fe80e123024eae82b 100644
--- a/02.recognize_digits/README.md
+++ b/02.recognize_digits/README.md
@@ -146,7 +146,7 @@ Here are the quick overview on the major fluid API complements.
 This is where you specify the network flow.
 1. `train_program`: A function that specify how to get avg_cost from `inference_program` and labels.
 This is where you specify the loss calculations.
-1. `optimizer`: Configure how to minimize the loss. Paddle supports most major optimization methods.
+1. `optimizer_func`:"A function that specifies the configuration of the the optimizer. The optimizer is responsible for minimizing the loss and driving the training. Paddle supports many different optimizers."
 1. `Trainer`: Fluid trainer manages the training process specified by the `train_program` and `optimizer`. Users can monitor the training
 progress through the `event_handler` callback function.
 1. `Inferencer`: Fluid inferencer loads the `inference_program` and the parameters trained by the Trainer.
@@ -245,6 +245,15 @@ def train_program():
     return [avg_cost, acc]
 ```
 
+#### Optimizer Function Configuration
+
+In the following `Adam` optimizer, `learning_rate` specifies the learning rate in the optimization procedure.
+
+```python
+def optimizer_program():
+    return fluid.optimizer.Adam(learning_rate=0.001)
+```
+
 ### Data Feeders Configuration
 
 Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*.  A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
@@ -266,15 +275,13 @@ test_reader = paddle.batch(
 ### Trainer Configuration
 
 Now, we need to setup the trainer. The trainer need to take in `train_program`, `place`, and `optimizer`.
-In the following `Adam` optimizer, `learning_rate` means the speed at which the network training converges.
 
 ```python
 use_cuda = False # set to True if training with GPU
 place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-optimizer = fluid.optimizer.Adam(learning_rate=0.001)
 
 trainer = fluid.Trainer(
-    train_func=train_program, place=place, optimizer=optimizer)
+    train_func=train_program, place=place, optimizer_func=optimizer_program)
  ```
 
 #### Event Handler
diff --git a/02.recognize_digits/index.cn.html b/02.recognize_digits/index.cn.html
index 482892985c5c28c21102f82af22192e1d78f1d34..6a7e018ce26d66cffe5676381cb37e551fe44adc 100644
--- a/02.recognize_digits/index.cn.html
+++ b/02.recognize_digits/index.cn.html
@@ -181,7 +181,7 @@ PaddlePaddle在API中提供了自动加载[MNIST](http://yann.lecun.com/exdb/mni
 1. `train_program`：指定如何从 `inference_program` 和`标签值`中获取 `loss` 的函数。
 这是指定损失计算的地方。
 
-1. `optimizer`: 配置如何最小化损失。PaddlePaddle 支持最主要的优化方法。
+1. `optimizer_func`: “指定优化器配置的函数。优化器负责减少损失并驱动培训。Paddle 支持多种不同的优化器。
 
 1. `Trainer`：PaddlePaddle Trainer 管理由 `train_program` 和 `optimizer` 指定的训练过程。
 通过 `event_handler` 回调函数，用户可以监控培训的进展。
@@ -280,6 +280,15 @@ def train_program():
 # 该模型运行在单个CPU上
 ```
 
+#### Optimizer Function 配置
+
+在下面的 `Adam optimizer`，`learning_rate` 是训练的速度，与网络的训练收敛速度有关系。
+
+```python
+def optimizer_program():
+    return fluid.optimizer.Adam(learning_rate=0.001)
+```
+
 ### 数据集 Feeders 配置
 
 下一步，我们开始训练过程。`paddle.dataset.movielens.train()`和`paddle.dataset.movielens.test()`分别做训练和测试数据集。这两个函数各自返回一个reader——PaddlePaddle中的reader是一个Python函数，每次调用的时候返回一个Python yield generator。
@@ -301,16 +310,14 @@ test_reader = paddle.batch(
 ### Trainer 配置
 
 现在，我们需要配置 `Trainer`。`Trainer` 需要接受训练程序 `train_program`, `place` 和优化器 `optimizer`。
-在下面的 `Adam optimizer`，`learning_rate` 是训练的速度，与网络的训练收敛速度有关系。
 
 ```python
 # 该模型运行在单个CPU上
 use_cuda = False # set to True if training with GPU
 place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-optimizer = fluid.optimizer.Adam(learning_rate=0.001)
 
 trainer = fluid.Trainer(
-    train_func=train_program, place=place, optimizer=optimizer)
+    train_func=train_program, place=place, optimizer_func=optimizer_program)
  ```
 
 #### Event Handler 配置
diff --git a/02.recognize_digits/index.html b/02.recognize_digits/index.html
index c83d5bcea3676d731af7d73fc409fe804a6b993e..800113c289b93fec49fe7dd66012381e29627363 100644
--- a/02.recognize_digits/index.html
+++ b/02.recognize_digits/index.html
@@ -188,7 +188,7 @@ Here are the quick overview on the major fluid API complements.
 This is where you specify the network flow.
 1. `train_program`: A function that specify how to get avg_cost from `inference_program` and labels.
 This is where you specify the loss calculations.
-1. `optimizer`: Configure how to minimize the loss. Paddle supports most major optimization methods.
+1. `optimizer_func`:"A function that specifies the configuration of the the optimizer. The optimizer is responsible for minimizing the loss and driving the training. Paddle supports many different optimizers."
 1. `Trainer`: Fluid trainer manages the training process specified by the `train_program` and `optimizer`. Users can monitor the training
 progress through the `event_handler` callback function.
 1. `Inferencer`: Fluid inferencer loads the `inference_program` and the parameters trained by the Trainer.
@@ -287,6 +287,15 @@ def train_program():
     return [avg_cost, acc]
 ```
 
+#### Optimizer Function Configuration
+
+In the following `Adam` optimizer, `learning_rate` specifies the learning rate in the optimization procedure.
+
+```python
+def optimizer_program():
+    return fluid.optimizer.Adam(learning_rate=0.001)
+```
+
 ### Data Feeders Configuration
 
 Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*.  A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
@@ -308,15 +317,13 @@ test_reader = paddle.batch(
 ### Trainer Configuration
 
 Now, we need to setup the trainer. The trainer need to take in `train_program`, `place`, and `optimizer`.
-In the following `Adam` optimizer, `learning_rate` means the speed at which the network training converges.
 
 ```python
 use_cuda = False # set to True if training with GPU
 place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-optimizer = fluid.optimizer.Adam(learning_rate=0.001)
 
 trainer = fluid.Trainer(
-    train_func=train_program, place=place, optimizer=optimizer)
+    train_func=train_program, place=place, optimizer_func=optimizer_program)
  ```
 
 #### Event Handler
diff --git a/02.recognize_digits/train.py b/02.recognize_digits/train.py
index d9947e4eb0159fda5539333b83afd5d44d62305f..8b91432e4a64524ccf9b0aee102b8e0cc17f8110 100644
--- a/02.recognize_digits/train.py
+++ b/02.recognize_digits/train.py
@@ -62,6 +62,10 @@ def train_program():
     return [avg_cost, acc]
 
 
+def optimizer_program():
+    return fluid.optimizer.Adam(learning_rate=0.001)
+
+
 def main():
     train_reader = paddle.batch(
         paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=500),
@@ -71,10 +75,9 @@ def main():
 
     use_cuda = os.getenv('WITH_GPU', '0') != '0'
     place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-    optimizer = fluid.optimizer.Adam(learning_rate=0.001)
 
     trainer = fluid.Trainer(
-        train_func=train_program, place=place, optimizer=optimizer)
+        train_func=train_program, place=place, optimizer_func=optimizer_program)
 
     # Save the parameter into a directory. The Inferencer can load the parameters from it to do infer
     params_dirname = "recognize_digits_network.inference.model"