@@ -44,6 +44,8 @@ The :ref:`api_fluid_CompiledProgram` is used to transform a program for various
feed=feed_dict,
fetch_list=[loss.name])
**Note**: :code:`fluid.Porgram` and :code:`compiler.CompiledPorgram` are completely different :code:`Programs`. :code:`fluid.Porgram` is composed of a series of operators. :code:`compiler.CompiledPorgram` compiles the :code:`fluid.Porgram` and converts it into a computational graph. :code:`compiler.CompiledPorgram` cannot be saved at present.
:code:`ParallelExecutor` is an executor that executes :code:`Program` separately on multiple nodes in a data-parallelism manner. Users can use the Python script to run :code:`ParallelExecutor`. The execution process of :code:`ParallelExecutor` is as follows:
:code:`ParallelExecutor` is an upgraded version of Executor, in addition, it supports model training of :code:`Program` in parallel with data. Users can use the Python script to run :code:`ParallelExecutor`. The execution process of :code:`ParallelExecutor` is as follows:
- First it builds :code:`SSA Graph` and a thread pool based on :code:`Program`, the number of :code:`GPU` cards (or :code:`CPU` cores) and :ref:`api_fluid_BuildStrategy` ;
- During execution, it executes the Op depending on whether the input of Op is ready, so that multiple Ops that do not depend on each other can be executed in parallel in the thread pool;
...
...
@@ -69,6 +68,7 @@ Since the execution speed of the model is related to the model structure and the
**Note**: :code:`fluid.Executor` and :code:`fluid.ParallelExecutor` are two completely different executors. First of all, their execution objects are different. The execution object of :code:`fluid.Executor` is :code:`fluid.Program` and the execution object of :code:`fluid.ParallelExecutor` is Graph. Secondly, their execution schedules are different. :code:`fluid.Executor` runs one by one according to the order of operators in :code:`fluid.Program`. :code:`fluid.ParallelExecutor` executes concurrently according to the dependencies between nodes in Graph.
loss = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.mean(loss)
sgd = fluid.optimizer.SGD(learning_rate=0.001)
sgd.minimize(loss)
...
...
@@ -38,16 +34,13 @@ Initialize Parameters
Random Initialization of Parameters
====================================
After the configuration of model,the initialization of parameters will be written into :code:`fluid.default_startup_program()` . By running this program in :code:`fluid.Executor()` , the random initialization of parameters will be finished in global :code:`fluid.global_scope()` .For example:
After the configuration of model,the initialization of parameters will be written into :code:`fluid.default_startup_program()` . By running this program in :code:`fluid.Executor()` , the random initialization of parameters will be finished in global scope, i.e. :code:`fluid.global_scope()` .For example:
.. code-block:: python
exe = fluid.Executor(fluid.CUDAPlace(0))
exe.run(program=fluid.default_startup_program())
Note that in multi-GPU training, the parameters should be initialized on GPU0 and then will be distributed to multiple graphic cards through :code:`fluid.ParallelExecutor` .
Load Predefined Parameters
===========================
...
...
@@ -62,12 +55,37 @@ In the runtime, feed data with :code:`run(feed=...)` and get persistable data wi
.. code-block:: python
...
loss = fluid.layers.mean(...)
exe = fluid.Executor(...)
# the result is an numpy array
result = exe.run(feed={"image": ..., "label": ...}, fetch_list=[loss])
import paddle.fluid as fluid
import numpy
train_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(train_program, startup_program):
data = fluid.layers.data(name='X', shape=[1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
sgd = fluid.optimizer.SGD(learning_rate=0.001)
sgd.minimize(loss)
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
# Run the startup program once and only once.
# Not need to optimize/compile the startup program.
startup_program.random_seed=1
exe.run(startup_program)
# Run the main program directly without compile.
x = numpy.random.random(size=(10, 1)).astype('float32')
@@ -81,21 +99,28 @@ In multi-card training, you can use :code:`fluid.compiler.CompiledProgram` to co
.. code-block:: python
exe = fluid.Executor(...)
compiled_prog = fluid.compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=loss.name)
result = exe.run(program=compiled_prog,
fetch_list=[loss.name],
feed={"image": ..., "label": ...})
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
os.environ['CPU_NUM'] = str(2)
compiled_prog = compiler.CompiledProgram(
train_program).with_data_parallel(
loss_name=loss.name)
loss_data, = exe.run(compiled_prog,
feed={"X": x},
fetch_list=[loss.name])
Notes:
1. The constructor of :ref:`api_fluid_CompiledProgram` needs to be set with :code:`fluid.Program` to be run which can not be modified at runtime.
2. If :code:`exe` is initialized with CUDAPlace, the model will be run in GPU. In the mode of graphics card training, all graphics card will be occupied. Users can configure `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ to change graphics cards that are being used.
3. If :code:`exe` is initialized with CPUPlace, the model will be run in CPU. In this situation, the multi-threads are used to run the model, and the number of threads is equal to the number of logic cores. Users can configure `CPU_NUM` to change the number of threads that are being used.
1. :ref:`api_fluid_CompiledProgram` will convert the input Program into a computational graph, and :code:`compiled_prog` is a completely different object from the incoming :code:`train_program`. At present, :code:`compiled_prog` can not be saved.
2. Multi-card training can also be used: ref:`api_fluid_ParallelExecutor` , but now it is recommended to use: :ref:`api_fluid_CompiledProgram`.
3. If :code:`exe` is initialized with CUDAPlace, the model will be run in GPU. In the mode of graphics card training, all graphics card will be occupied. Users can configure `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ to change graphics cards that are being used.
4. If :code:`exe` is initialized with CPUPlace, the model will be run in CPU. In this situation, the multi-threads are used to run the model, and the number of threads is equal to the number of logic cores. Users can configure `CPU_NUM` to change the number of threads that are being used.