So far you have already been familiar with Fluid. And the next expectation should be building a more efficient model or inventing your original Operator. If so, read more on:
- `Fluid Design Principles <../advanced_usage/design_idea/fluid_design_idea_en.html>`_ : Design principles underlying Fluid to help you understand how the framework runs.
- `Design Principles of Fluid <../advanced_usage/design_idea/fluid_design_idea_en.html>`_ : Design principles underlying Fluid to help you understand how the framework runs.
- `Deploy Inference Model <../advanced_usage/deploy/index_en.html>`_ :How to deploy the trained network to perform practical inference
This instruction will show you how to compile PaddlePaddle on a *64-bit desktop or laptop* and Windows 10. The Windows systems we support must meet the following requirements:
...
...
@@ -60,7 +59,6 @@ Please note: The current version does not support NCCL and distributed related f
6. Execute cmake:
> For details on the compilation options, see [the compilation options list](../Tables.html/#Compile).
* For users who need to compile **the CPU version PaddlePaddle**:
For Python2:`cmake .. -G "Visual Studio 14 2015 Win64" -DPYTHON_INCLUDE_DIR = $ {PYTHON_INCLUDE_DIRS}
**Windows 7/8/10 Pro/Enterprise(64bit)(CUDA 8.0/9.0 are supported, and only single GPU is supported)*
**Windows 7/8/10 Pro/Enterprise(64bit)(CUDA 8.0/9.0/10.0 are supported, and only single GPU is supported)*
**Python 2.7.15+/3.5.1+/3.6/3.7(64bit)*
**pip or pip3 9.0.1+(64bit)*
...
...
@@ -16,10 +16,10 @@
* If your computer doesn’t have NVIDIA® GPU, please install the CPU version of PaddlePaddle
* If your computer has NVIDIA® GPU, and it satisfies the following requirements, we recommend you to install the GPU version of PaddlePaddle
**CUDA Toolkit 8.0/9.0 with cuDNN v7.3+*
**CUDA Toolkit 8.0 with cuDNN v7.1+, or 9.0/10.0 with cuDNN v7.3+*
**GPU's computing capability exceeds 1.0*
Note: currently, the official Windows installation package only support CUDA 8.0/9.0 with single GPU, and don't support CUDA 9.1/9.2/10.0/10.1. if you need to use, please compile by yourself through the source code.
Note: currently, the official Windows installation package only support CUDA 8.0/9.0/10.0 with single GPU, and don't support CUDA 9.1/9.2/10.1. if you need to use, please compile by yourself through the source code.
Please refer to the NVIDIA official documents for the installation process and the configuration methods of [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and [cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/).
...
...
@@ -43,7 +43,7 @@ There is a checking function below for [verifyig whether the installation is suc
Notice:
* The version of pip and the version of python should be corresponding: python2.7 corresponds to `pip`; python3.x corresponds to `pip3`.
*`pip install paddlepaddle-gpu` This command will install PaddlePaddle that supports CUDA 8.0/9.0 cuDNN v7.3+, Currently, PaddlePaddle doesn't support any other version of CUDA or cuDNN on Windows.
*`pip install paddlepaddle-gpu` This command will install PaddlePaddle that supports CUDA 8.0(with cuDNN v7.1+), or CUDA 9.0/10.0(with cuDNN v7.3+).
Specific output value will be shown at the runtime of Executor. Detailed process will be explained later.
Specific output value will be shown at the runtime of Executor. There are two ways to get runtime Variable value. The first way is to use `paddle.fluid.layers.Print` to create a print op that will print the tensor being accessed. The second way is to add Variable to Fetch_list.
For more information on how to use the Print API, please refer to [Print operator](https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/control_flow.html#print).
Detailed process of the second way Fetch_list will be explained later.
FLAGS_conv_workspace_size_limit=1024 set the workspace limit size for choosing cuDNN convolution algorithms to 1024MB.
cudnn_batchnorm_spatial_persistent
FLAGS_cudnn_batchnorm_spatial_persistent
*******************************************
(since 1.4.0)
...
...
@@ -37,7 +37,7 @@ Note
This mode can be faster in some tasks because an optimized path will be selected for CUDNN_DATA_FLOAT and CUDNN_DATA_HALF data types. The reason we set it to False by default is that this mode may use scaled atomic integer reduction which may cause a numerical overflow for some input data range.
cudnn_deterministic
FLAGS_cudnn_deterministic
*******************************************
(since 0.13.0)
...
...
@@ -56,7 +56,7 @@ Note
Now this flag is enabled in cuDNN convolution and pooling operator. The deterministic algorithms may slower, so this flag is generally used for debugging.
This flag is only for developer of paddlepaddle, user should not set it.
communicator_independent_recv_thread
FLAGS_communicator_independent_recv_thread
**************************************
(since 1.5.0)
...
...
@@ -40,7 +40,7 @@ Note
This flag is for developer to debug and optimize the framework. User should not set it.
communicator_max_merge_var_num
FLAGS_communicator_max_merge_var_num
**************************************
(since 1.5.0)
...
...
@@ -59,7 +59,7 @@ Note
This flag has strong relationship with trainer thread num. The default value should be the same with thread num.
communicator_merge_sparse_grad
FLAGS_communicator_merge_sparse_grad
*******************************
(since 1.5.0)
...
...
@@ -78,11 +78,11 @@ Note
Merging sparse gradient would be time-consuming. If the sparse gradient has many duplicated ids, it will save memory and communication could be much faster. Otherwise it will not save memory.
communicator_min_send_grad_num_before_recv
FLAGS_communicator_min_send_grad_num_before_recv
*******************************************
(since 1.5.0)
In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server.
In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least FLAGS_communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server.
Values accepted
---------------
...
...
@@ -97,7 +97,7 @@ Note
This flag has strong relation with the training threads of trainer. because each training thread will send it's grad. So the default value should be training thread num.
communicator_send_queue_size
FLAGS_communicator_send_queue_size
*******************************************
(since 1.5.0)
...
...
@@ -116,7 +116,7 @@ Note
This flag will affect the training speed, if the queue size is larger, the speed may be faster, but may make the result worse.
communicator_send_wait_times
FLAGS_communicator_send_wait_times
*******************************************
(since 1.5.0)
...
...
@@ -131,7 +131,7 @@ Example
FLAGS_communicator_send_wait_times=5 set the times that send thread will wait if merge number does not reach max_merge_var_num to 5.
communicator_thread_pool_size
FLAGS_communicator_thread_pool_size
*******************************************
(since 1.5.0)
...
...
@@ -150,7 +150,7 @@ Note
Most of time user does not need to set this flag.
dist_threadpool_size
FLAGS_dist_threadpool_size
*******************************************
(Since 1.0.0)
...
...
@@ -165,7 +165,7 @@ Example
FLAGS_dist_threadpool_size=10 will enable 10 threads as max number of thread used for distributed module.
rpc_deadline
FLAGS_rpc_deadline
*******************************************
(Since 1.0.0)
...
...
@@ -180,11 +180,11 @@ Example
FLAGS_rpc_deadline=180000 will set deadline timeout to 3 minute.
rpc_disable_reuse_port
FLAGS_rpc_disable_reuse_port
*******************************************
(since 1.2.0)
When rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
When FLAGS_rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
disable the use of SO_REUSEPORT if it's available.
Values accepted
...
...
@@ -196,7 +196,7 @@ Example
FLAGS_rpc_disable_reuse_port=True will disable the use of SO_REUSEPORT.
rpc_get_thread_num
FLAGS_rpc_get_thread_num
*******************************************
(Since 1.0.0)
...
...
@@ -211,7 +211,7 @@ Example
FLAGS_rpc_get_thread_num=6 will use 6 threads to get parameter from parameter server.
rpc_send_thread_num
FLAGS_rpc_send_thread_num
*******************************************
(Since 1.0.0)
...
...
@@ -226,11 +226,11 @@ Example
FLAGS_rpc_send_thread_num=6 will set number thread used for send to 6.
rpc_server_profile_path
FLAGS_rpc_server_profile_path
*******************************************
since(v0.15.0)
Set the profiler output log file path prefix. The complete path will be rpc_server_profile_path_listener_id, listener_id is a random number.
Set the profiler output log file path prefix. The complete path will be FLAGS_rpc_server_profile_path_listener_id, listener_id is a random number.
@@ -21,7 +21,7 @@ FLAGS_allocator_strategy=naive_best_fit would use the new-designed allocator.
eager_delete_scope
FLAGS_eager_delete_scope
*******************************************
(since 0.12.0)
...
...
@@ -36,7 +36,7 @@ Example
FLAGS_eager_delete_scope=True will make scope delete synchronously.
eager_delete_tensor_gb
FLAGS_eager_delete_tensor_gb
*******************************************
(since 1.0.0)
...
...
@@ -60,7 +60,7 @@ It is recommended that users enable garbage collection strategy by setting FLAGS
enable_inplace_whitelist
FLAGS_enable_inplace_whitelist
*******************************************
(since 1.4)
...
...
@@ -76,7 +76,7 @@ FLAGS_enable_inplace_whitelist=True would disable memory in-place optimization o
fast_eager_deletion_mode
FLAGS_fast_eager_deletion_mode
*******************************************
(since 1.3)
...
...
@@ -93,7 +93,7 @@ FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strate
FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.
fraction_of_gpu_memory_to_use
FLAGS_fraction_of_gpu_memory_to_use
*******************************************
(since 1.2.0)
...
...
@@ -113,7 +113,7 @@ Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by d
Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.
free_idle_memory
FLAGS_free_idle_memory
*******************************************
(since 0.15.0)
...
...
@@ -130,7 +130,7 @@ FLAGS_free_idle_memory=True will free idle memory when there is too much of it.
FLAGS_free_idle_memory=False will not free idle memory.
fuse_parameter_groups_size
FLAGS_fuse_parameter_groups_size
*******************************************
(since 1.4.0)
...
...
@@ -146,7 +146,7 @@ FLAGS_fuse_parameter_groups_size=3 will set the size of one group parameters' gr
fuse_parameter_memory_size
FLAGS_fuse_parameter_memory_size
*******************************************
(since 1.5.0)
...
...
@@ -161,7 +161,7 @@ Example
FLAGS_fuse_parameter_memory_size=16 set the up limited memory size of one group parameters' gradient to 16 Megabytes.
init_allocated_mem
FLAGS_init_allocated_mem
*******************************************
(since 0.15.0)
...
...
@@ -178,7 +178,7 @@ FLAGS_init_allocated_mem=True will make the allocated memory initialize as a non
FLAGS_init_allocated_mem=False will not initialize the allocated memory.
initial_cpu_memory_in_mb
FLAGS_initial_cpu_memory_in_mb
*******************************************
(since 0.14.0)
...
...
@@ -193,7 +193,7 @@ Example
FLAGS_initial_cpu_memory_in_mb=100, if FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) > 100MB, then allocator will pre-allocate 100MB when first allocation request raises, and re-allocate 100MB again when the pre-allocated memory is exhaustive.
initial_gpu_memory_in_mb
FLAGS_initial_gpu_memory_in_mb
*******************************************
(since 1.4.0)
...
...
@@ -213,7 +213,7 @@ If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use
If you don't set this flag, PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate gpu memory.
limit_of_tmp_allocation
FLAGS_limit_of_tmp_allocation
*******************************************
(since 1.3)
...
...
@@ -228,7 +228,7 @@ Example
FLAGS_limit_of_tmp_allocation=1024 will set the up limit of temporary_allocation size to 1024 bytes.
memory_fraction_of_eager_deletion
FLAGS_memory_fraction_of_eager_deletion
*******************************************
(since 1.4)
...
...
@@ -248,7 +248,7 @@ FLAGS_memory_fraction_of_eager_deletion=1 would release all temporary variables.
FLAGS_memory_fraction_of_eager_deletion=0.5 would only release 50% of variables with largest memory size.
reallocate_gpu_memory_in_mb
FLAGS_reallocate_gpu_memory_in_mb
*******************************************
(since 1.4.0)
...
...
@@ -268,12 +268,12 @@ If this flag is set, PaddlePaddle will reallocate the gpu memory with size speci
Else PaddlePaddle will reallocate with size set by FLAGS_fraction_of_gpu_memory_to_use.
times_excess_than_required_tmp_allocation
FLAGS_times_excess_than_required_tmp_allocation
*******************************************
(since 1.3)
The FLAGS_times_excess_than_required_tmp_allocation indicates the max size the TemporaryAllocator can return. For Example
, if the required memory size is N, and times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
, if the required memory size is N, and FLAGS_times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
Values accepted
---------------
...
...
@@ -284,7 +284,7 @@ Example
FLAGS_times_excess_than_required_tmp_allocation=1024 will set the max size of the TemporaryAllocator can return to 1024*N.
- `LoD-Tensor User Guide <lod_tensor_en.html>`_ : LoD-Tensor is a unique term of Fluid. It appends sequence information to Tensor,and supports data of variable lengths.
About more difffiult and complex examples of application, please refer to associated information about `models <../../../user_guides/models/index_en.html>`_ .
PaddlePaddle Fluid supports two methods to feed data into networks:
1. Synchronous method - Python Reader:Firstly, use :code:`fluid.layers.data` to set up data input layer. Then, feed in the training data through :code:`executor.run(feed=...)` in :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
2. Asynchronous method - py_reader:Firstly, use :code:`fluid.layers.py_reader` to set up data input layer. Then configure the data source with functions :code:`decorate_paddle_reader` or :code:`decorate_tensor_provider` of :code:`py_reader` . After that, call :code:`fluid.layers.read_file` to read data.
Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
feeding_data_en.rst
Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:
This document mainly introduces how to provide data for the network, including Synchronous-method and Asynchronous-method.
.. toctree::
:maxdepth: 1
prepare_steps_en.rst
reader.md
Asynchronous py_reader
########################
Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to:
PaddlePaddle Fluid supports two methods to feed data into networks:
1. Synchronous method - Python Reader:Firstly, use :code:`fluid.layers.data` to set up data input layer. Then, feed in the training data through :code:`executor.run(feed=...)` in :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
2. Asynchronous method - py_reader:Firstly, use :code:`fluid.layers.py_reader` to set up data input layer. Then configure the data source with functions :code:`decorate_paddle_reader` or :code:`decorate_tensor_provider` of :code:`py_reader` . After that, call :code:`fluid.layers.read_file` to read data.
Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
feeding_data_en.rst
Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
reader.md
Asynchronous py_reader
########################
Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to:
Besides Python Reader, we provide PyReader. The performance of PyReader is better than :ref:`user_guide_use_numpy_array_as_train_data` , because the process of loading data is asynchronous with the process of training model when PyReader is in use. And PyReader can coordinate with :code:`double_buffer_reader` to improve the performance of reading data. What's more, :code:`double_buffer_reader` can achieve the transformation from CPU Tensor to GPU Tensor, which improve the efficiency of reading data to some extent.
Besides Python Reader, we provide PyReader. The performance of PyReader is better than :ref:`user_guide_use_numpy_array_as_train_data_en` , because the process of loading data is asynchronous with the process of training model when PyReader is in use. And PyReader can coordinate with :code:`double_buffer_reader` to improve the performance of reading data. What's more, :code:`double_buffer_reader` can achieve the transformation from CPU Tensor to GPU Tensor, which improve the efficiency of reading data to some extent.
- `Prepare Data <../user_guides/howto/prepare_data/index_en.html>`_ :This section introduces data types supported and data transmission methods when you are training your networks with Fluid.