Currently PaddlePaddle Fluid has 2 branches of API interfaces:
- Low-level API:
- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
- High-level API:
- Simple to use
- Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:
This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:
Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
**Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.
**Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Distributed training in pserver mode
Distributed training in pserver mode
======================================
======================================
For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example :
For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example :
.. code-block:: python
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
#Configuring policy config
#Configuring policy config
config.slice_var_up = False
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
t.transpile(trainer_id,
program=main_program,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
pservers="192.168.0.1:6174,192.168.0.2:6174",
...
@@ -51,7 +51,7 @@ Configuration for general environment variables:
...
@@ -51,7 +51,7 @@ Configuration for general environment variables:
- :code:`FLAGS_rpc_deadline` : int, the longest waiting time for RPC communication, in milliseconds, default 180000
- :code:`FLAGS_rpc_deadline` : int, the longest waiting time for RPC communication, in milliseconds, default 180000
Distributed training in NCCL2 mode
Distributed training in NCCL2 mode
====================================
====================================
The multi-node synchronous training mode based on NCCL2 (Collective Communication) is only supported in the GPU cluster.
The multi-node synchronous training mode based on NCCL2 (Collective Communication) is only supported in the GPU cluster.
...
@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`
...
@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`
As mentioned above, :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
As mentioned above, :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
...
@@ -53,7 +53,7 @@ This is because some common operations on Tensor may consist of more basic opera
...
@@ -53,7 +53,7 @@ This is because some common operations on Tensor may consist of more basic opera
More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_
More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_
=========
=========
...
@@ -75,4 +75,4 @@ Related API
...
@@ -75,4 +75,4 @@ Related API
* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
* In Fluid,the execution order in a Block is determined by control flow,such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to: :ref:`api_guide_control_flow_en`
* In Fluid,the execution order in a Block is determined by control flow,such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to: :ref:`api_guide_control_flow_en`
@@ -121,7 +121,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
...
@@ -121,7 +121,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
4. (Only For Python3) Set Python-related environment variables:
4. (Only For Python3) Set Python-related environment variables:
- a. First use
- a. First use
```find `dirname $(dirname
```find `dirname $(dirname
$(which python3))` -name "libpython3.*.dylib"```
$(which python3))` -name "libpython3.*.dylib"```
to find the path to Pythonlib (the first one it prompts is the dylib path for the python you need to use), then (below [python-lib-path] is replaced by finding the file path)
to find the path to Pythonlib (the first one it prompts is the dylib path for the python you need to use), then (below [python-lib-path] is replaced by finding the file path)
...
@@ -148,7 +148,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
...
@@ -148,7 +148,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
Since we are using CMake3.4 please follow the steps below:
Since we are using CMake3.4 please follow the steps below:
1. Download the CMake image from the [official CMake website](https://cmake.org/files/v3.4/cmake-3.4.3-Darwin-x86_64.dmg) and install it.
1. Download the CMake image from the [official CMake website](https://cmake.org/files/v3.4/cmake-3.4.3-Darwin-x86_64.dmg) and install it.
2. Enter `sudo "/Applications/CMake.app/Contents/bin/cmake-gui" –install` in the console
2. Enter `sudo "/Applications/CMake.app/Contents/bin/cmake-gui" –install` in the console
- b. If you do not want to use the system default blas and want to use your own installed OPENBLAS please read [FAQ](../FAQ.html/#OPENBLAS)
- b. If you do not want to use the system default blas and want to use your own installed OPENBLAS please read [FAQ](../FAQ.html/#OPENBLAS)