Add CompiledProgram doc (#761)

* Add compiled_program.py * polish doc * follow comment

Add CompiledProgram doc (#761)
* Add compiled_program.py * polish doc * follow comment
6df1668b · chengduo · Cheerego · 975894b2 · 6df1668b · 6df1668b
5 changed file
--- a/doc/fluid/api_guides/index_en.rst
+++ b/doc/fluid/api_guides/index_en.rst
@@ -18,6 +18,7 @@ This section introduces the Fluid API structure and usage, to help you quickly g
    low_level/memory_optimize_en.rst
    low_level/nets_en.rst
    low_level/parallel_executor_en.rst
+    low_level/compiled_program_en.rst
    low_level/backward_en.rst
    low_level/parameter_en.rst
    low_level/program_en.rst
--- a/doc/fluid/api_guides/low_level/compiled_program_en.rst
+++ b/doc/fluid/api_guides/low_level/compiled_program_en.rst
+..  _api_guide_compiled_program_en:
+
+################
+CompiledProgram
+################
+
+The :ref:`api_fluid_CompiledProgram` is used to transform a program for various optimizations. For example, you can use :code:`with_data_parallel` to transform the program to data parallel program so that it can be run in multiple devices. 
+
+
+.. code-block:: python
+
+    # Note:
+    #   - If you want to specify the GPU cards which are used to run 
+    #     in ParallelExecutor, you should define the CUDA_VISIBLE_DEVICES 
+    #     in environment.
+    #   - If you want to use multi CPU to run the program in ParallelExecutor, 
+    #     you should define the CPU_NUM in the environment.
+    
+    # First create the Executor.
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # Run the startup program once and only once.
+    exe.run(fluid.default_startup_program())
+    
+    # Run the main program directly without compile.
+    loss = exe.run(fluid.default_main_program(),
+                    feed=feed_dict,
+                    fetch_list=[loss.name])
+    
+    # Or, compiled the program, and then run the model with data parallel.
+    exec_strategy = fluid.ExecutionStrategy()
+    exec_strategy.num_threads = dev_count * 4 # the size of thread pool.
+    build_strategy = fluid.BuildStrategy()
+    build_strategy.memory_optimize = True if memory_opt else False  
+
+    compiled_prog = compiler.CompiledProgram(
+        fluid.default_main_program()).with_data_parallel(
+            loss_name=loss.name,
+            build_strategy=build_strategy,
+            exec_strategy=exec_strategy)
+
+    loss, = exe.run(compiled_prog,
+                    feed=feed_dict,
+                    fetch_list=[loss.name])
+
+
+- Related API :
+ - :ref:`api_fluid_CompiledProgram`
\ No newline at end of file
--- a/doc/fluid/api_guides/low_level/executor_en.rst
+++ b/doc/fluid/api_guides/low_level/executor_en.rst
@@ -4,13 +4,31 @@
 Executor
 ################

-:code:`Executor` realizes a simple executor in which all operators will be executed in order. You can run :code:`Executor` in a Python script. There are two kinds of executors in PaddlePaddle Fluid. One is single-thread executor which is the default option for :code:`Executor` 
-and another is the parallel executor which is illustrated in :ref:`api_guide_parallel_executor_en` .
+:code:`Executor` realizes a simple executor in which all operators will be executed in order. You can run :code:`Executor` in a Python script. There are two kinds of executors in PaddlePaddle Fluid. One is single-thread executor which is the default option for :code:`Executor` and the other is the parallel executor which is illustrated in :ref:`api_guide_parallel_executor_en` . The config of `Executor` and :ref:`api_guide_parallel_executor_en` is different, it may be a bit confusing for some users. To make the executor more facility, we introduce :ref:`api_guide_compiled_program_en` , :ref:`api_guide_compiled_program_en` is used to transform a program for various optimizations, and it can be run by :code:`Executor`.

 The logic of :code:`Executor` is very simple. It is suggested to thoroughly run the model with :code:`Executor` in debugging phase on one computer and then switch to mode of multiple devices or multiple computers to compute.

 :code:`Executor` receives a :code:`Place` at construction, which can either be :ref:`api_fluid_CPUPlace` or :ref:`api_fluid_CUDAPlace`. 

+.. code-block:: python
+
+    # First create the Executor.
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # Run the startup program once and only once.
+    exe.run(fluid.default_startup_program())
+    
+    # Run the main program directly.
+    loss, = exe.run(fluid.default_main_program(),
+                    feed=feed_dict,
+                    fetch_list=[loss.name])
+
+
 For simple example please refer to `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 

-For API Reference, please refer to :ref:`api_fluid_Executor`.
\ No newline at end of file
+- Related API :
+ - :ref:`api_fluid_Executor`
+
+
+
--- a/doc/fluid/api_guides/low_level/parallel_executor_en.rst
+++ b/doc/fluid/api_guides/low_level/parallel_executor_en.rst
@@ -31,6 +31,45 @@ These two modes are specified by :code:`build_strategy`. For how to use them, pl

 Since the execution speed of the model is related to the model structure and the execution strategy of the executor, :code:`ParallelExecutor` allows you to modify the relevant parameters of the executor, such as the size of thread pool  ( :code:`num_threads` ), how many iterations should be done to clean up temporary variables :code:`num_iteration_per_drop_scope` . For more information, please refer to :ref:`api_fluid_ExecutionStrategy`.

+
+.. code-block:: python
+
+    # Note:
+    #   - If you want to specify the GPU cards which are used to run 
+    #     in ParallelExecutor, you should define the CUDA_VISIBLE_DEVICES 
+    #     in environment.
+    #   - If you want to use multi CPU to run the program in ParallelExecutor, 
+    #     you should define the CPU_NUM in the environment.
+
+    # First create the Executor.
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # Run the startup program once and only once.
+    exe.run(fluid.default_startup_program())
+
+    # Define train_exe and test_exe
+    exec_strategy = fluid.ExecutionStrategy()
+    exec_strategy.num_threads = dev_count * 4 # the size of thread pool.
+    build_strategy = fluid.BuildStrategy()
+    build_strategy.memory_optimize = True if memory_opt else False
+
+    train_exe = fluid.ParallelExecutor(use_cuda=use_cuda, 
+                                       main_program=train_program, 
+                                       build_strategy=build_strategy,
+                                       exec_strategy=exec_strategy,
+                                       loss_name=loss.name)
+    # NOTE: loss_name is unnecessary for test_exe.
+    test_exe = fluid.ParallelExecutor(use_cuda=True,
+                                      main_program=test_program,
+                                      build_strategy=build_strategy,
+                                      exec_strategy=exec_strategy,
+                                      share_vars_from=train_exe)
+
+    train_loss, = train_exe.run(fetch_list=[loss.name], feed=feed_dict)
+    test_loss, = test_exe.run(fetch_list=[loss.name], feed=feed_dict)
+
+
 - Related API :
 - :ref:`api_fluid_ParallelExecutor`
 - :ref:`api_fluid_BuildStrategy`

--- a/doc/fluid/user_guides/howto/training/single_node_en.rst
+++ b/doc/fluid/user_guides/howto/training/single_node_en.rst
@@ -77,19 +77,25 @@ Notes:

 Multi-card Training
 #######################
-
-In multi-card training, you can use :code:`fluid.ParallelExecutor` to run training :code:`fluid.Program`. For example:
+In multi-card training, you can use :code:`fluid.compiler.CompiledProgram` to compile the :code:`fluid.Program`, and then call :code:`with_data_parallel`. For example:

 .. code-block:: python

-   train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name,
-                                main_program=fluid.default_main_program())
-   train_exe.run(fetch_list=[loss.name], feed={...})
+    exe = fluid.Executor(...)
+    
+    compiled_prog = fluid.compiler.CompiledProgram(
+        fluid.default_main_program()).with_data_parallel(
+            loss_name=loss.name)
+           
+    result = exe.run(program=compiled_prog, 
+                    fetch_list=[loss.name], 
+                    feed={"image": ..., "label": ...}) 

 Notes:

-1. The constructor of :code:`ParallelExecutor` needs to be set with :code:`fluid.Program` to be run which can not be modified at runtime. The default value is :code:`fluid.default_main_program()` .
-2. :code:`ParallelExecutor` should be indicated whether to use CUDA to train. In the mode of graphic card training, all graphic cards will be occupied. Users can configure `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ to change graphics cards that are being used.
+1. The constructor of :ref:`api_fluid_CompiledProgram` needs to be set with :code:`fluid.Program` to be run which can not be modified at runtime. 
+2. If :code:`exe` is initialized with CUDAPlace, the model will be run in GPU. In the mode of graphics card training, all graphics card will be occupied. Users can configure `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ to change graphics cards that are being used. 
+3. If :code:`exe` is initialized with CPUPlace, the model will be run in CPU. In this situation, the multi-threads are used to run the model, and the number of threads is equal to the number of logic cores. Users can configure `CPU_NUM`  to change the number of threads that are being used. 

 Advanced Usage
 ###############
@@ -98,8 +104,3 @@ Advanced Usage
   :maxdepth: 2

   test_while_training_en.rst
-
-
-
-
-