未验证 提交 e4cf6e82 编写于 作者: X xsrobin 提交者: GitHub

modify for pr=17924 (#886)

* new flags documents

* flags

* flags

* flags

* modify for PR=17924

* modify for pr=17924

* modify for pr=17924
上级 181cfb26
...@@ -3,7 +3,7 @@ API ...@@ -3,7 +3,7 @@ API
===== =====
.. toctree:: .. toctree::
:hidden: :hidden:
../flags_cn.rst ../flags_cn.rst
../api_guides/index_cn.rst ../api_guides/index_cn.rst
......
Flags ==================
环境变量FLAGS
================== ==================
...@@ -443,21 +444,6 @@ Int32型,缺省值为3。 ...@@ -443,21 +444,6 @@ Int32型,缺省值为3。
FLAGS_fuse_parameter_groups_size=3 - 将单组参数的梯度大小设为3。 FLAGS_fuse_parameter_groups_size=3 - 将单组参数的梯度大小设为3。
fuse_parameter_memory_size
*******************************************
(始于1.4.0)
FLAGS_fuse_parameter_memory_size表示作为通信调用输入(例如NCCLAllReduce)的单组参数梯度的上限内存大小。默认值为0,表示不根据memory_size设置组。单位是字节。
取值范围
---------------
Uint64型,缺省值为0。
示例
-------
FLAGS_fuse_parameter_memory_size=131072 - 将单组参数梯度的上限大小设为131072字节。
init_allocated_mem init_allocated_mem
******************************************* *******************************************
(始于0.15.0) (始于0.15.0)
......
================== ==================
Flags FLAGS
================== ==================
allocator_strategy allocator_strategy
************************************** **************************************
(since 1.2) (since 1.2)
Use to choose allocator strategy of PaddlePaddle. The allocator strategy is under development, and the non-legacy allocator is not stable yet. Use to choose allocator strategy of PaddlePaddle. The allocator strategy is under development, and the non-legacy allocator is not stable yet.
Values accepted Values accepted
--------------- ---------------
String, enum in ['legacy', 'naive_best_fit']. The default value is 'legacy'. String, enum in ['legacy', 'naive_best_fit']. The default value is 'legacy'.
Example Example
-------- --------
FLAGS_allocator_strategy=legacy would use the legacy allocator. FLAGS_allocator_strategy=legacy would use the legacy allocator.
FLAGS_allocator_strategy=naive_best_fit would use the new-designed allocator. FLAGS_allocator_strategy=naive_best_fit would use the new-designed allocator.
benchmark benchmark
************************************** **************************************
(since 0.12.0) (since 0.12.0)
Used to do benchmark. If set, it will make scope delete synchronized, add some memory usage log, and synchronize all cuda kernel after kernel launches. Used to do benchmark. If set, it will make scope delete synchronized, add some memory usage log, and synchronize all cuda kernel after kernel launches.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_benchmark=True will do some synchronizations to test benchmark. FLAGS_benchmark=True will do some synchronizations to test benchmark.
check_nan_inf check_nan_inf
************************************** **************************************
(since 0.13.0) (since 0.13.0)
This Flag is used for debugging. It is used to check whether the result of the Operator has Nan or Inf. This Flag is used for debugging. It is used to check whether the result of the Operator has Nan or Inf.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_check_nan_inf=True will check the result of Operator whether the result has Nan or Inf. FLAGS_check_nan_inf=True will check the result of Operator whether the result has Nan or Inf.
communicator_fake_rpc communicator_fake_rpc
************************************** **************************************
(since 1.5.0) (since 1.5.0)
When set true, communicator will not really do rpc call, so the speed will not be affected by network communication. This flag is used for debugging purpose. When set true, communicator will not really do rpc call, so the speed will not be affected by network communication. This flag is used for debugging purpose.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is false. Bool. The default value is false.
Example Example
------- -------
FLAGS_communicator_fake_rpc=True will enable communicator fake mode. FLAGS_communicator_fake_rpc=True will enable communicator fake mode.
Note Note
------- -------
This flag is only for developer of paddlepaddle, user should not set it. This flag is only for developer of paddlepaddle, user should not set it.
communicator_independent_recv_thread communicator_independent_recv_thread
************************************** **************************************
(since 1.5.0) (since 1.5.0)
use an independent thread to receive parameter from parameter server use an independent thread to receive parameter from parameter server
Values accepted Values accepted
--------------- ---------------
Bool. The default value is True. Bool. The default value is True.
Example Example
------- -------
FLAGS_communicator_independent_recv_thread=True will use an independent thread to receive parameter from parameter server. FLAGS_communicator_independent_recv_thread=True will use an independent thread to receive parameter from parameter server.
Note Note
------- -------
This flag is for developer to debug and optimize the framework. User should not set it. This flag is for developer to debug and optimize the framework. User should not set it.
communicator_max_merge_var_num communicator_max_merge_var_num
************************************** **************************************
(since 1.5.0) (since 1.5.0)
max gradient number to merge and send as one gradient by communicator. Trainer will put all gradients into a queue, then communicator will take the gradients out from the queue and merge them before send. max gradient number to merge and send as one gradient by communicator. Trainer will put all gradients into a queue, then communicator will take the gradients out from the queue and merge them before send.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 20. Int32. The default value is 20.
Example Example
------- -------
FLAGS_communicator_max_merge_var_num=16 will set the max gradient number to merge and send as one gradient to 16. FLAGS_communicator_max_merge_var_num=16 will set the max gradient number to merge and send as one gradient to 16.
Note Note
------- -------
This flag has strong relationship with trainer thread num. The default value should be the same with thread num. This flag has strong relationship with trainer thread num. The default value should be the same with thread num.
communicator_min_send_grad_num_before_recv communicator_min_send_grad_num_before_recv
******************************************* *******************************************
(since 1.5.0) (since 1.5.0)
In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server. In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 20. Int32. The default value is 20.
Example Example
------- -------
FLAGS_communicator_min_send_grad_num_before_recv=10 will set the number of gradients sent by the send thread to 10 before the receive thread receive parameter from parameter server. FLAGS_communicator_min_send_grad_num_before_recv=10 will set the number of gradients sent by the send thread to 10 before the receive thread receive parameter from parameter server.
Note Note
------- -------
This flag has strong relation with the training threads of trainer. because each training thread will send it's grad. So the default value should be training thread num. This flag has strong relation with the training threads of trainer. because each training thread will send it's grad. So the default value should be training thread num.
communicator_send_queue_size communicator_send_queue_size
******************************************* *******************************************
(since 1.5.0) (since 1.5.0)
The queue size for each gradient. Trainer will put gradient into a queue, and communicator will take gradient out from the queue and then send them out. When communicator is slow, the queue may be full and then the trainer will be blocked until the queue has space. It's used to avoid the situation that training is much more faster than communication. There will be too much gradients that is not sent out in time. The queue size for each gradient. Trainer will put gradient into a queue, and communicator will take gradient out from the queue and then send them out. When communicator is slow, the queue may be full and then the trainer will be blocked until the queue has space. It's used to avoid the situation that training is much more faster than communication. There will be too much gradients that is not sent out in time.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 20. Int32. The default value is 20.
Example Example
------- -------
FLAGS_communicator_send_queue_size=10 will set the queue size for each gradient to 10. FLAGS_communicator_send_queue_size=10 will set the queue size for each gradient to 10.
Note Note
------- -------
This flag will affect the training speed, if the queue size is larger, the speed may be faster, but may make the result worse. This flag will affect the training speed, if the queue size is larger, the speed may be faster, but may make the result worse.
communicator_send_wait_times communicator_send_wait_times
******************************************* *******************************************
(since 1.5.0) (since 1.5.0)
times that send thread will wait if merge number does not reach max_merge_var_num. times that send thread will wait if merge number does not reach max_merge_var_num.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 5. Int32. The default value is 5.
Example Example
------- -------
FLAGS_communicator_send_wait_times=5 set the times that send thread will wait if merge number does not reach max_merge_var_num to 5. FLAGS_communicator_send_wait_times=5 set the times that send thread will wait if merge number does not reach max_merge_var_num to 5.
communicator_thread_pool_size communicator_thread_pool_size
******************************************* *******************************************
(since 1.5.0) (since 1.5.0)
Set the thread pool size that used to do gradient send and parameter receive. Set the thread pool size that used to do gradient send and parameter receive.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 5. Int32. The default value is 5.
Example Example
------- -------
FLAGS_communicator_thread_pool_size=10 set the thread pool size to 10. FLAGS_communicator_thread_pool_size=10 set the thread pool size to 10.
Note Note
------- -------
Most of time user does not need to set this flag. Most of time user does not need to set this flag.
conv_workspace_size_limit conv_workspace_size_limit
******************************************* *******************************************
(since 0.13.0) (since 0.13.0)
The workspace limit size in MB unit for choosing cuDNN convolution algorithms. The inner funciton of cuDNN obtain the fastest suited algorithm that fits within this memory limit. Usually, large workspace size may lead to choose faster algorithms, but significant increasing memory workspace. Users need to trade-off between memory and speed. The workspace limit size in MB unit for choosing cuDNN convolution algorithms. The inner funciton of cuDNN obtain the fastest suited algorithm that fits within this memory limit. Usually, large workspace size may lead to choose faster algorithms, but significant increasing memory workspace. Users need to trade-off between memory and speed.
Values accepted Values accepted
--------------- ---------------
Uint64. The default value is 4096. That is to say, 4G memory workspace. Uint64. The default value is 4096. That is to say, 4G memory workspace.
Example Example
------- -------
FLAGS_conv_workspace_size_limit=1024 set the workspace limit size for choosing cuDNN convolution algorithms to 1024MB. FLAGS_conv_workspace_size_limit=1024 set the workspace limit size for choosing cuDNN convolution algorithms to 1024MB.
cpu_deterministic cpu_deterministic
******************************************* *******************************************
(since 0.15.0) (since 0.15.0)
This Flag is used for debugging. It indicates whether to make the result of computation deterministic in CPU side. In some case, the result of the different order of summing maybe different,for example, the result of `a+b+c+d` may be different with the result of `c+a+b+d`. This Flag is used for debugging. It indicates whether to make the result of computation deterministic in CPU side. In some case, the result of the different order of summing maybe different,for example, the result of `a+b+c+d` may be different with the result of `c+a+b+d`.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_cpu_deterministic=True will make the result of computation deterministic in CPU side. FLAGS_cpu_deterministic=True will make the result of computation deterministic in CPU side.
cudnn_batchnorm_spatial_persistent cudnn_batchnorm_spatial_persistent
******************************************* *******************************************
(since 1.4.0) (since 1.4.0)
Indicates whether to use the new batch normalization mode CUDNN_BATCHNORM_SPATIAL_PERSISTENT function in batchnorm. Indicates whether to use the new batch normalization mode CUDNN_BATCHNORM_SPATIAL_PERSISTENT function in batchnorm.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_cudnn_batchnorm_spatial_persistent=True will enable the CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode. FLAGS_cudnn_batchnorm_spatial_persistent=True will enable the CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode.
Note Note
------- -------
This mode can be faster in some tasks because an optimized path will be selected for CUDNN_DATA_FLOAT and CUDNN_DATA_HALF data types. The reason we set it to False by default is that this mode may use scaled atomic integer reduction which may cause a numerical overflow for some input data range. This mode can be faster in some tasks because an optimized path will be selected for CUDNN_DATA_FLOAT and CUDNN_DATA_HALF data types. The reason we set it to False by default is that this mode may use scaled atomic integer reduction which may cause a numerical overflow for some input data range.
cudnn_deterministic cudnn_deterministic
******************************************* *******************************************
(since 0.13.0) (since 0.13.0)
For one operation, cuDNN has several algorithms, some algorithm results are non-deterministic, like convolution algorithms. This flag is used for debugging. It indicates whether to choose the deterministic in cuDNN. For one operation, cuDNN has several algorithms, some algorithm results are non-deterministic, like convolution algorithms. This flag is used for debugging. It indicates whether to choose the deterministic in cuDNN.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_cudnn_deterministic=True will choose the deterministic in cuDNN. FLAGS_cudnn_deterministic=True will choose the deterministic in cuDNN.
Note Note
------- -------
Now this flag is enabled in cuDNN convolution and pooling operator. The deterministic algorithms may slower, so this flag is generally used for debugging. Now this flag is enabled in cuDNN convolution and pooling operator. The deterministic algorithms may slower, so this flag is generally used for debugging.
cudnn_exhaustive_search cudnn_exhaustive_search
******************************************* *******************************************
(since 1.2.0) (since 1.2.0)
Whether to use exhaustive search method to choose convolution algorithms. There are two search methods, heuristic search and exhaustive search in cuDNN. The exhaustive search attempts all cuDNN algorithms to choose the fastest algorithm. This method is time-consuming, the choosed algorithm will be cached for the given layer specifications. Once the layer specifications (like batch size, feature map size) are changed, it will search again. Whether to use exhaustive search method to choose convolution algorithms. There are two search methods, heuristic search and exhaustive search in cuDNN. The exhaustive search attempts all cuDNN algorithms to choose the fastest algorithm. This method is time-consuming, the choosed algorithm will be cached for the given layer specifications. Once the layer specifications (like batch size, feature map size) are changed, it will search again.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_cudnn_exhaustive_search=True will use exhaustive search method to choose convolution algorithms. FLAGS_cudnn_exhaustive_search=True will use exhaustive search method to choose convolution algorithms.
dist_threadpool_size dist_threadpool_size
******************************************* *******************************************
(Since 1.0.0) (Since 1.0.0)
Control the number of thread used for distributed module. If it's not set, it will be set to hardware threads. Control the number of thread used for distributed module. If it's not set, it will be set to hardware threads.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 0. Int32. The default value is 0.
Example Example
------- -------
FLAGS_dist_threadpool_size=10 will enable 10 threads as max number of thread used for distributed module. FLAGS_dist_threadpool_size=10 will enable 10 threads as max number of thread used for distributed module.
eager_delete_scope eager_delete_scope
******************************************* *******************************************
(since 0.12.0) (since 0.12.0)
Make scope delete synchronously. If set, it will reduce GPU memory usage but slow down the destruction of variables (around 1% performance harm). Make scope delete synchronously. If set, it will reduce GPU memory usage but slow down the destruction of variables (around 1% performance harm).
Values accepted Values accepted
--------------- ---------------
Bool. The default value is True. Bool. The default value is True.
Example Example
------- -------
FLAGS_eager_delete_scope=True will make scope delete synchronously. FLAGS_eager_delete_scope=True will make scope delete synchronously.
eager_delete_tensor_gb eager_delete_tensor_gb
******************************************* *******************************************
(since 1.0.0) (since 1.0.0)
Whether to use garbage collection strategy to optimize the memory usage of network. If FLAGS_eager_delete_tensor_gb >= 0, garbage collection strategy would be enabled, and collect memory garbages when running network, which is beneficial to saving memory usage. It is only useful when you use Executor to run program, or compile program, or compile program with data parallel. If FLAGS_eager_delete_tensor_gb < 0, garbage collection strategy is disabled. Garbage collector would not release memory garbages until the memory size of garbages reaches FLAGS_eager_delete_tensor_gb GB. Whether to use garbage collection strategy to optimize the memory usage of network. If FLAGS_eager_delete_tensor_gb >= 0, garbage collection strategy would be enabled, and collect memory garbages when running network, which is beneficial to saving memory usage. It is only useful when you use Executor to run program, or compile program, or compile program with data parallel. If FLAGS_eager_delete_tensor_gb < 0, garbage collection strategy is disabled. Garbage collector would not release memory garbages until the memory size of garbages reaches FLAGS_eager_delete_tensor_gb GB.
Values accepted Values accepted
--------------- ---------------
Double, in GB unit. The default value is -1.0. Double, in GB unit. The default value is -1.0.
Example Example
------- -------
FLAGS_eager_delete_tensor_gb=0.0 would make memory garbage release immediately once it is not used. FLAGS_eager_delete_tensor_gb=0.0 would make memory garbage release immediately once it is not used.
FLAGS_eager_delete_tensor_gb=1.0 would make memory garbage release till the memory size of garbages reaches 1.0GB. FLAGS_eager_delete_tensor_gb=1.0 would make memory garbage release till the memory size of garbages reaches 1.0GB.
FLAGS_eager_delete_tensor_gb=-1.0 would disable garbage collection strategy. FLAGS_eager_delete_tensor_gb=-1.0 would disable garbage collection strategy.
Note Note
------- -------
It is recommended that users enable garbage collection strategy by setting FLAGS_eager_delete_tensor_gb=0.0 when training large network. It is recommended that users enable garbage collection strategy by setting FLAGS_eager_delete_tensor_gb=0.0 when training large network.
enable_cublas_tensor_op_math enable_cublas_tensor_op_math
******************************************* *******************************************
(since 1.2.0) (since 1.2.0)
This Flag indicates whether to use Tensor Core, but it may lose some precision. This Flag indicates whether to use Tensor Core, but it may lose some precision.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
enable_cublas_tensor_op_math=True will use Tensor Core. enable_cublas_tensor_op_math=True will use Tensor Core.
enable_inplace_whitelist enable_inplace_whitelist
******************************************* *******************************************
(since 1.4) (since 1.4)
Debug use to disable memory in-place in some ops. If set, some ops would not perform in-place optimization to save memory. These ops include: sigmoid, exp, relu, tanh, sqrt, ceil, floor, reciprocal, relu6, soft_relu, hard_sigmoid, batch_norm, batch_norm_grad, sum, sum_grad, scale, reshape, elementwise_add, and elementwise_add_grad. Debug use to disable memory in-place in some ops. If set, some ops would not perform in-place optimization to save memory. These ops include: sigmoid, exp, relu, tanh, sqrt, ceil, floor, reciprocal, relu6, soft_relu, hard_sigmoid, batch_norm, batch_norm_grad, sum, sum_grad, scale, reshape, elementwise_add, and elementwise_add_grad.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_enable_inplace_whitelist=True would disable memory in-place optimization on certain ops. FLAGS_enable_inplace_whitelist=True would disable memory in-place optimization on certain ops.
enable_parallel_graph enable_parallel_graph
******************************************* *******************************************
(since 1.2.0) (since 1.2.0)
This Flag is used for ParallelExecutor to disable parallel graph execution mode. This Flag is used for ParallelExecutor to disable parallel graph execution mode.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_enable_parallel_graph=False will force disable parallel graph execution mode by ParallelExecutor. FLAGS_enable_parallel_graph=False will force disable parallel graph execution mode by ParallelExecutor.
enable_rpc_profiler enable_rpc_profiler
******************************************* *******************************************
(Since 1.0.0) (Since 1.0.0)
Enable RPC profiler or not. Enable RPC profiler or not.
Values accepted Values accepted
---------------- ----------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_enable_rpc_profiler=True will enable rpc profiler and record the timeline to profiler file. FLAGS_enable_rpc_profiler=True will enable rpc profiler and record the timeline to profiler file.
fast_eager_deletion_mode fast_eager_deletion_mode
******************************************* *******************************************
(since 1.3) (since 1.3)
Whether to use fast garbage collection strategy. If not set, gpu memory would be released when CUDA kernel ends. Otherwise, gpu memory would be released without waiting CUDA kernel ends, making garbage collection strategy faster. Only valid when garbage collection strategy is enabled. Whether to use fast garbage collection strategy. If not set, gpu memory would be released when CUDA kernel ends. Otherwise, gpu memory would be released without waiting CUDA kernel ends, making garbage collection strategy faster. Only valid when garbage collection strategy is enabled.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is True. Bool. The default value is True.
Example Example
------- -------
FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strategy. FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strategy.
FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy. FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.
fraction_of_gpu_memory_to_use fraction_of_gpu_memory_to_use
******************************************* *******************************************
(since 1.2.0) (since 1.2.0)
Allocate a chunk of gpu memory that is this fraction of the total gpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the same size will be requested from gpu until the gpu has no memory left for another chunk. Allocate a chunk of gpu memory that is this fraction of the total gpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the same size will be requested from gpu until the gpu has no memory left for another chunk.
Values accepted Values accepted
--------------- ---------------
Uint64 value greater than 0 which is the initial GPU memory percentage. Uint64 value greater than 0 which is the initial GPU memory percentage.
Example Example
------- -------
FLAGS_fraction_of_gpu_memory_to_use=0.1 will allocate 10% total gpu memory size as initial GPU chunk. FLAGS_fraction_of_gpu_memory_to_use=0.1 will allocate 10% total gpu memory size as initial GPU chunk.
Note Note
------- -------
Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by default. Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by default.
Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default. Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.
free_idle_memory free_idle_memory
******************************************* *******************************************
(since 0.15.0) (since 0.15.0)
Whether to free idle memory pre-allocated from system during runtime. If set, free idle memory would be released if there is too much free idle memory in the pre-allocated allocator. Whether to free idle memory pre-allocated from system during runtime. If set, free idle memory would be released if there is too much free idle memory in the pre-allocated allocator.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Bool. The default value is False.
Example Example
------- -------
FLAGS_free_idle_memory=True will free idle memory when there is too much of it. FLAGS_free_idle_memory=True will free idle memory when there is too much of it.
FLAGS_free_idle_memory=False will not free idle memory. FLAGS_free_idle_memory=False will not free idle memory.
fuse_parameter_groups_size fuse_parameter_groups_size
******************************************* *******************************************
(since 1.4.0) (since 1.4.0)
FLAGS_fuse_parameter_groups_size is the size of one group parameters' gradient. The default value is an empirical result. If the fuse_parameter_groups_size is 1, it means that the groups' size is the number of parameters' gradient. If the fuse_parameter_groups_size is -1, it means that there is only one group. The default value is 3, it is an empirical value. FLAGS_fuse_parameter_groups_size is the size of one group parameters' gradient. The default value is an empirical result. If the fuse_parameter_groups_size is 1, it means that the groups' size is the number of parameters' gradient. If the fuse_parameter_groups_size is -1, it means that there is only one group. The default value is 3, it is an empirical value.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 3. Int32. The default value is 3.
Example Example
------- -------
FLAGS_fuse_parameter_groups_size=3 will set the size of one group parameters' gradient to 3. FLAGS_fuse_parameter_groups_size=3 will set the size of one group parameters' gradient to 3.
fuse_parameter_memory_size init_allocated_mem
******************************************* *******************************************
(since 1.4.0) (since 0.15.0)
FLAGS_fuse_parameter_memory_size indicates the up limited memory size of one group parameters' gradient which is the input of communication calling ( e.g NCCLAllReduce). The default value is 0, it means that not set group according to memory_size. The unit is byte. Whether to initialize the allocated memory by some non-zero values. This flag is for debug use to prevent that some ops assumes that the memory allocated is initialized to be zero.
Values accepted Values accepted
--------------- ---------------
Uint64. The default value is 0. Bool. The default value is False.
Example Example
------- -------
FLAGS_fuse_parameter_memory_size=131072 set the up limited memory size of one group parameters' gradient to 131072 bytes. FLAGS_init_allocated_mem=True will make the allocated memory initialize as a non-zero value.
FLAGS_init_allocated_mem=False will not initialize the allocated memory.
init_allocated_mem
*******************************************
(since 0.15.0) initial_cpu_memory_in_mb
*******************************************
Whether to initialize the allocated memory by some non-zero values. This flag is for debug use to prevent that some ops assumes that the memory allocated is initialized to be zero. (since 0.14.0)
Values accepted Initial CPU memory chunk size in MB of PaddlePaddle allocator. Allocator would take the minimal value of FLAGS_initial_cpu_memory_in_mb and FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) as the memory chunk size.
---------------
Bool. The default value is False. Values accepted
---------------
Example Uint64. The default value is 500 with unit MB.
-------
FLAGS_init_allocated_mem=True will make the allocated memory initialize as a non-zero value. Example
-------
FLAGS_init_allocated_mem=False will not initialize the allocated memory. FLAGS_initial_cpu_memory_in_mb=100, if FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) > 100MB, then allocator will pre-allocate 100MB when first allocation request raises, and re-allocate 100MB again when the pre-allocated memory is exhaustive.
initial_cpu_memory_in_mb initial_gpu_memory_in_mb
******************************************* *******************************************
(since 0.14.0) (since 1.4.0)
Initial CPU memory chunk size in MB of PaddlePaddle allocator. Allocator would take the minimal value of FLAGS_initial_cpu_memory_in_mb and FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) as the memory chunk size. Allocate a chunk of GPU memory whose byte size is specified by the flag. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the gpu memory will be requested from gpu with size specified by FLAGS_reallocate_gpu_memory_in_mb until the gpu has no memory left for the additional chunk.
Values accepted Values accepted
--------------- ---------------
Uint64. The default value is 500 with unit MB. Uint64 value greater than 0 which is the initial GPU memory size in MB.
Example Example
------- -------
FLAGS_initial_cpu_memory_in_mb=100, if FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) > 100MB, then allocator will pre-allocate 100MB when first allocation request raises, and re-allocate 100MB again when the pre-allocated memory is exhaustive. FLAGS_initial_gpu_memory_in_mb=4096 will allocate 4 GB as initial GPU chunk.
Note
initial_gpu_memory_in_mb -------
******************************************* If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use will be overrided by this flag.
(since 1.4.0) If you don't set this flag, PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate gpu memory.
Allocate a chunk of GPU memory whose byte size is specified by the flag. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the gpu memory will be requested from gpu with size specified by FLAGS_reallocate_gpu_memory_in_mb until the gpu has no memory left for the additional chunk.
inner_op_parallelism
Values accepted *******************************************
--------------- (since 1.3.0)
Uint64 value greater than 0 which is the initial GPU memory size in MB.
Most operators are working in single thread mode, but for some operator, use multi thread is more suitable. For Example, optimization op that optimize sparse gradient will be much faster to use multi thread. This flag is used to set the thread number inside an operator.
Example
------- Values accepted
FLAGS_initial_gpu_memory_in_mb=4096 will allocate 4 GB as initial GPU chunk. ---------------
Int32. The default value is 0 which means that operator will not run in multi thread mode.
Note
------- Example
If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use will be overrided by this flag. -------
If you don't set this flag, PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate gpu memory. FLAGS_inner_op_parallelism=5 will set the thread number inside an operator to 5.
Note
inner_op_parallelism -------
******************************************* currently only sparse adam op supports inner_op_parallelism.
(since 1.3.0)
Most operators are working in single thread mode, but for some operator, use multi thread is more suitable. For Example, optimization op that optimize sparse gradient will be much faster to use multi thread. This flag is used to set the thread number inside an operator. limit_of_tmp_allocation
*******************************************
Values accepted (since 1.3)
---------------
Int32. The default value is 0 which means that operator will not run in multi thread mode. The FLAGS_limit_of_tmp_allocation indicates the up limit of temporary_allocation size, the unit is byte. If the FLAGS_limit_of_tmp_allocation is -1, the size of temporary_allocation will not be limited.
Example Values accepted
------- ---------------
FLAGS_inner_op_parallelism=5 will set the thread number inside an operator to 5. Int64. The default value is -1.
Note Example
------- -------
currently only sparse adam op supports inner_op_parallelism. FLAGS_limit_of_tmp_allocation=1024 will set the up limit of temporary_allocation size to 1024 bytes.
limit_of_tmp_allocation max_body_size
******************************************* *******************************************
(since 1.3) (Since 1.0.0)
The FLAGS_limit_of_tmp_allocation indicates the up limit of temporary_allocation size, the unit is byte. If the FLAGS_limit_of_tmp_allocation is -1, the size of temporary_allocation will not be limited. It controls the max message size in BRPC.
Values accepted Values accepted
--------------- ---------------
Int64. The default value is -1. Int32. The default value is 2147483647.
Example Example
------- -------
FLAGS_limit_of_tmp_allocation=1024 will set the up limit of temporary_allocation size to 1024 bytes. FLAGS_max_body_size=2147483647 will set the BRPC message size to 2147483647.
max_body_size memory_fraction_of_eager_deletion
******************************************* *******************************************
(Since 1.0.0) (since 1.4)
It controls the max message size in BRPC. A memory size percentage when garbage collection strategy decides which variables should be released. If FLAGS_memory_fraction_of_eager_deletion=1.0, all temporary variables in the network would be released. If FLAGS_memory_fraction_of_eager_deletion=0.0, all temporary variables in the network would not be released. If 0.0<FLAGS_memory_fraction_of_eager_deletion<1.0, all temporary variables would be sorted descendingly according to their memory size, and only
FLAGS_memory_fraction_of_eager_deletion of variables with largest memory size would be released. This flag is only valid when running compiled program with data parallel.
Values accepted
--------------- Values accepted
Int32. The default value is 2147483647. ---------------
Double, inside [0.0, 1.0]. The default value is 1.0.
Example
------- Example
FLAGS_max_body_size=2147483647 will set the BRPC message size to 2147483647. -------
FLAGS_memory_fraction_of_eager_deletion=0 would keep all temporary variables, that is to say, disabling garbage collection strategy.
memory_fraction_of_eager_deletion FLAGS_memory_fraction_of_eager_deletion=1 would release all temporary variables.
*******************************************
(since 1.4) FLAGS_memory_fraction_of_eager_deletion=0.5 would only release 50% of variables with largest memory size.
A memory size percentage when garbage collection strategy decides which variables should be released. If FLAGS_memory_fraction_of_eager_deletion=1.0, all temporary variables in the network would be released. If FLAGS_memory_fraction_of_eager_deletion=0.0, all temporary variables in the network would not be released. If 0.0<FLAGS_memory_fraction_of_eager_deletion<1.0, all temporary variables would be sorted descendingly according to their memory size, and only
FLAGS_memory_fraction_of_eager_deletion of variables with largest memory size would be released. This flag is only valid when running compiled program with data parallel. multiple_of_cupti_buffer_size
*******************************************
Values accepted (since 1.4.0)
---------------
Double, inside [0.0, 1.0]. The default value is 1.0. This Flag is used for profiling. It indicates the multiple of the CUPTI device buffer size. When you are profiling, if the program breaks down or bugs rise when loading timeline file in chrome://traxing, try increasing this value.
Example Values accepted
------- ---------------
FLAGS_memory_fraction_of_eager_deletion=0 would keep all temporary variables, that is to say, disabling garbage collection strategy. Int32. The default value is 1.
FLAGS_memory_fraction_of_eager_deletion=1 would release all temporary variables. Example
-------
FLAGS_memory_fraction_of_eager_deletion=0.5 would only release 50% of variables with largest memory size. FLAGS_multiple_of_cupti_buffer_size=1 set the multiple of the CUPTI device buffer size to 1.
multiple_of_cupti_buffer_size paddle_num_threads
******************************************* *******************************************
(since 1.4.0) (since 0.15.0)
This Flag is used for profiling. It indicates the multiple of the CUPTI device buffer size. When you are profiling, if the program breaks down or bugs rise when loading timeline file in chrome://traxing, try increasing this value. Control the number of threads of each paddle instance.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 1. Int32. The default value is 1.
Example Example
------- -------
FLAGS_multiple_of_cupti_buffer_size=1 set the multiple of the CUPTI device buffer size to 1. FLAGS_paddle_num_threads=2 will enable 2 threads as max number of threads for each instance.
paddle_num_threads pe_profile_fname
******************************************* *******************************************
(since 0.15.0) (since 1.3.0)
Control the number of threads of each paddle instance. This Flag is used for debugging for ParallelExecutor. The ParallelExecutor will generate the profile result by gperftools, and the profile result will be stored in the file which is specified by FLAGS_pe_profile_fname. Only valid when compiled `WITH_PRIFILER=ON`. Empty if disable.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 1. String. The default value is empty ("").
Example Example
------- -------
FLAGS_paddle_num_threads=2 will enable 2 threads as max number of threads for each instance. FLAGS_pe_profile_fname="./parallel_executor.perf" will store the profile result to parallel_executor.perf.
pe_profile_fname print_sub_graph_dir
******************************************* *******************************************
(since 1.3.0) (since 1.2.0)
This Flag is used for debugging for ParallelExecutor. The ParallelExecutor will generate the profile result by gperftools, and the profile result will be stored in the file which is specified by FLAGS_pe_profile_fname. Only valid when compiled `WITH_PRIFILER=ON`. Empty if disable. This Flag is used for debugging. If some subgraphs of the transformed graph from the program are disconnected, the result may be problematic. We can print these disconnected subgraphs to a file specified by the flag. Empty if disable.
Values accepted Values accepted
--------------- ---------------
String. The default value is empty (""). String. The default value is empty ("").
Example Example
------- -------
FLAGS_pe_profile_fname="./parallel_executor.perf" will store the profile result to parallel_executor.perf. FLAGS_print_sub_graph_dir="./sub_graphs.txt" will print the disconnected subgraphs to "./sub_graphs.txt".
print_sub_graph_dir reader_queue_speed_test_mode
******************************************* *******************************************
(since 1.2.0) (since 1.1.0)
This Flag is used for debugging. If some subgraphs of the transformed graph from the program are disconnected, the result may be problematic. We can print these disconnected subgraphs to a file specified by the flag. Empty if disable. Set the pyreader data queue to test mode. In test mode, pyreader will cache some data, executor will then read the cached data, so reader will not be the bottleneck.
Values accepted Values accepted
--------------- ---------------
String. The default value is empty (""). Bool. The default value is False.
Example Example
------- -------
FLAGS_print_sub_graph_dir="./sub_graphs.txt" will print the disconnected subgraphs to "./sub_graphs.txt". FLAGS_reader_queue_speed_test_mode=True will enable the pyreader test mode.
Note
reader_queue_speed_test_mode -------
******************************************* This flag will work only when you are using py_reader.
(since 1.1.0)
Set the pyreader data queue to test mode. In test mode, pyreader will cache some data, executor will then read the cached data, so reader will not be the bottleneck. reallocate_gpu_memory_in_mb
*******************************************
Values accepted (since 1.4.0)
---------------
Bool. The default value is False. Re-allocate additional GPU chunk if run out of allocated GPU memory chunk.
Example Values accepted
------- ---------------
FLAGS_reader_queue_speed_test_mode=True will enable the pyreader test mode. Int64 value greater than 0 in MB
Note Example
------- -------
This flag will work only when you are using py_reader. FLAGS_reallocate_gpu_memory_in_mb=1024 will re-allocate 1 GB if run out of GPU memory chunk.
Note
reallocate_gpu_memory_in_mb -------
******************************************* If this flag is set, PaddlePaddle will reallocate the gpu memory with size specified by this flag.
(since 1.4.0) Else PaddlePaddle will reallocate with size set by FLAGS_fraction_of_gpu_memory_to_use.
Re-allocate additional GPU chunk if run out of allocated GPU memory chunk.
rpc_deadline
Values accepted *******************************************
--------------- (Since 1.0.0)
Int64 value greater than 0 in MB
It controls the deadline timeout of the rpc communication.
Example
------- Values accepted
FLAGS_reallocate_gpu_memory_in_mb=1024 will re-allocate 1 GB if run out of GPU memory chunk. ---------------
Int32. The default value is 180000 in ms.
Note
------- Example
If this flag is set, PaddlePaddle will reallocate the gpu memory with size specified by this flag. -------
Else PaddlePaddle will reallocate with size set by FLAGS_fraction_of_gpu_memory_to_use. FLAGS_rpc_deadline=180000 will set deadline timeout to 3 minute.
rpc_deadline rpc_disable_reuse_port
******************************************* *******************************************
(Since 1.0.0) (since 1.2.0)
It controls the deadline timeout of the rpc communication. When rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
disable the use of SO_REUSEPORT if it's available.
Values accepted
--------------- Values accepted
Int32. The default value is 180000 in ms. ---------------
Bool. The default value is False.
Example
------- Example
FLAGS_rpc_deadline=180000 will set deadline timeout to 3 minute. -------
FLAGS_rpc_disable_reuse_port=True will disable the use of SO_REUSEPORT.
rpc_disable_reuse_port
******************************************* rpc_get_thread_num
(since 1.2.0) *******************************************
(Since 1.0.0)
When rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
disable the use of SO_REUSEPORT if it's available. It controls the number of threads used to get parameter from parameter server.
Values accepted Values accepted
--------------- ---------------
Bool. The default value is False. Int32. The default value is 12.
Example Example
------- -------
FLAGS_rpc_disable_reuse_port=True will disable the use of SO_REUSEPORT. FLAGS_rpc_get_thread_num=6 will use 6 threads to get parameter from parameter server.
rpc_get_thread_num rpc_send_thread_num
******************************************* *******************************************
(Since 1.0.0) (Since 1.0.0)
It controls the number of threads used to get parameter from parameter server. It controls the number of threads used for send rpc.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 12. Int32. The default value is 12.
Example Example
------- -------
FLAGS_rpc_get_thread_num=6 will use 6 threads to get parameter from parameter server. FLAGS_rpc_send_thread_num=6 will set number thread used for send to 6.
rpc_send_thread_num rpc_server_profile_path
******************************************* *******************************************
(Since 1.0.0) since(v0.15.0)
It controls the number of threads used for send rpc. Set the profiler output log file path prefix. The complete path will be rpc_server_profile_path_listener_id, listener_id is a random number.
Values accepted Values accepted
--------------- ---------------
Int32. The default value is 12. String. The default value is "./profile_ps".
Example Example
------- -------
FLAGS_rpc_send_thread_num=6 will set number thread used for send to 6. FLAGS_rpc_server_profile_path="/tmp/pserver_profile_log" generate profile log file at "/tmp/pserver_profile_log_listener_id".
rpc_server_profile_path selected_gpus
******************************************* *******************************************
since(v0.15.0) (since 1.3)
Set the profiler output log file path prefix. The complete path will be rpc_server_profile_path_listener_id, listener_id is a random number. Set the GPU devices used for training or inference.
Values accepted Values accepted
--------------- ---------------
String. The default value is "./profile_ps". A comma-separated list of device IDs, where each device ID is a nonnegative integer less than the number of GPU devices your machine have.
Example Example
------- -------
FLAGS_rpc_server_profile_path="/tmp/pserver_profile_log" generate profile log file at "/tmp/pserver_profile_log_listener_id". FLAGS_selected_gpus=0,1,2,3,4,5,6,7 makes GPU devices 0-7 to be used for training or inference.
Note
selected_gpus -------
******************************************* The reason for using this flag is that we want to use collective communication between GPU devices, but with CUDA_VISIBLE_DEVICES can only use share-memory.
(since 1.3)
Set the GPU devices used for training or inference. sync_nccl_allreduce
*******************************************
Values accepted (since 1.3)
---------------
A comma-separated list of device IDs, where each device ID is a nonnegative integer less than the number of GPU devices your machine have. If the FLAGS_sync_nccl_allreduce is true, there will call `cudaStreamSynchronize(nccl_stream)` in allreduce_op_handle, this mode can get better performance in some scenarios.
Example Values accepted
------- ---------------
FLAGS_selected_gpus=0,1,2,3,4,5,6,7 makes GPU devices 0-7 to be used for training or inference. Bool. The default value is True.
Note Example
------- -------
The reason for using this flag is that we want to use collective communication between GPU devices, but with CUDA_VISIBLE_DEVICES can only use share-memory. FLAGS_sync_nccl_allreduce=True will call `cudaStreamSynchronize(nccl_stream)` in allreduce_op_handle.
sync_nccl_allreduce times_excess_than_required_tmp_allocation
******************************************* *******************************************
(since 1.3) (since 1.3)
If the FLAGS_sync_nccl_allreduce is true, there will call `cudaStreamSynchronize(nccl_stream)` in allreduce_op_handle, this mode can get better performance in some scenarios. The FLAGS_times_excess_than_required_tmp_allocation indicates the max size the TemporaryAllocator can return. For Example
, if the required memory size is N, and times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
Values accepted
--------------- Values accepted
Bool. The default value is True. ---------------
Int64. The default value is 2.
Example
------- Example
FLAGS_sync_nccl_allreduce=True will call `cudaStreamSynchronize(nccl_stream)` in allreduce_op_handle. -------
FLAGS_times_excess_than_required_tmp_allocation=1024 will set the max size of the TemporaryAllocator can return to 1024*N.
times_excess_than_required_tmp_allocation
******************************************* tracer_profile_fname
(since 1.3) *******************************************
(since 1.4.0)
The FLAGS_times_excess_than_required_tmp_allocation indicates the max size the TemporaryAllocator can return. For Example
, if the required memory size is N, and times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N. FLAGS_tracer_profile_fname indicates the profiler filename for imperative tracer, which generated by gperftools. Only valid when compiled `WITH_PROFILER=ON`. Empty if disabled.
Values accepted Values accepted
--------------- ---------------
Int64. The default value is 2. String. The default value is ("gperf").
Example Example
------- -------
FLAGS_times_excess_than_required_tmp_allocation=1024 will set the max size of the TemporaryAllocator can return to 1024*N. FLAGS_tracer_profile_fname="gperf_profile_file" will set the profiler filename for imperative tracer to "gperf_profile_file".
tracer_profile_fname use_mkldnn
******************************************* *******************************************
(since 1.4.0) (since 0.13.0)
FLAGS_tracer_profile_fname indicates the profiler filename for imperative tracer, which generated by gperftools. Only valid when compiled `WITH_PROFILER=ON`. Empty if disabled. Give a choice to run with Intel MKL-DNN (https://github.com/intel/mkl-dnn) library on inference or training.
Values accepted Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open-source performance library for deep-learning applications. The library accelerates deep-learning applications and frameworks on Intel(R) architecture. Intel MKL-DNN contains vectorized and threaded building blocks that you can use to implement deep neural networks (DNN) with C and C++ interfaces.
---------------
String. The default value is ("gperf"). Values accepted
---------------
Example Bool. The default value is False.
-------
FLAGS_tracer_profile_fname="gperf_profile_file" will set the profiler filename for imperative tracer to "gperf_profile_file". Example
-------
FLAGS_use_mkldnn=True will enable running with MKL-DNN support.
use_mkldnn
******************************************* Note
(since 0.13.0) -------
FLAGS_use_mkldnn is only used for python training and inference scripts. To enable MKL-DNN in CAPI, set build option -DWITH_MKLDNN=ON
Give a choice to run with Intel MKL-DNN (https://github.com/intel/mkl-dnn) library on inference or training. Intel MKL-DNN supports Intel 64 architecture and compatible architectures. The library is optimized for the systems based on:
Intel Atom(R) processor with Intel SSE4.1 support
Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open-source performance library for deep-learning applications. The library accelerates deep-learning applications and frameworks on Intel(R) architecture. Intel MKL-DNN contains vectorized and threaded building blocks that you can use to implement deep neural networks (DNN) with C and C++ interfaces. 4th, 5th, 6th, 7th, and 8th generation Intel(R) Core(TM) processor
Intel(R) Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
Values accepted Intel(R) Xeon(R) Scalable processors (formerly Skylake and Cascade Lake)
--------------- Intel(R) Xeon Phi(TM) processors (formerly Knights Landing and Knights Mill)
Bool. The default value is False. and compatible processors.
Example
------- use_ngraph
FLAGS_use_mkldnn=True will enable running with MKL-DNN support. *******************************************
(since 1.4.0)
Note
------- Give a choice to run with Intel nGraph(https://github.com/NervanaSystems/ngraph) engine on inference or training. This will obtain much performance boost on Intel Xeon CPU.
FLAGS_use_mkldnn is only used for python training and inference scripts. To enable MKL-DNN in CAPI, set build option -DWITH_MKLDNN=ON
Intel MKL-DNN supports Intel 64 architecture and compatible architectures. The library is optimized for the systems based on: Values accepted
Intel Atom(R) processor with Intel SSE4.1 support ---------------
4th, 5th, 6th, 7th, and 8th generation Intel(R) Core(TM) processor Bool. The default value is False.
Intel(R) Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
Intel(R) Xeon(R) Scalable processors (formerly Skylake and Cascade Lake) Example
Intel(R) Xeon Phi(TM) processors (formerly Knights Landing and Knights Mill) -------
and compatible processors. FLAGS_use_ngraph=True will enable running with nGraph support.
Note
use_ngraph -------
******************************************* Intel nGraph is only supported in few models yet. We have only verified [ResNet-50](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_ngraph.md) training and inference.
(since 1.4.0)
Give a choice to run with Intel nGraph(https://github.com/NervanaSystems/ngraph) engine on inference or training. This will obtain much performance boost on Intel Xeon CPU. use_pinned_memory
*******************************************
Values accepted (since 0.12.0)
---------------
Bool. The default value is False. Whether to use cpu pinned memory. If set, CPU allocator calls mlock to lock pages.
Example Values accepted
------- ---------------
FLAGS_use_ngraph=True will enable running with nGraph support. Bool. The default value is True.
Note Example
------- -------
Intel nGraph is only supported in few models yet. We have only verified [ResNet-50](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_ngraph.md) training and inference. FLAGS_use_pinned_memory=True would make the pages of allocated cpu memory lock.
use_pinned_memory
*******************************************
(since 0.12.0)
Whether to use cpu pinned memory. If set, CPU allocator calls mlock to lock pages.
Values accepted
---------------
Bool. The default value is True.
Example
-------
FLAGS_use_pinned_memory=True would make the pages of allocated cpu memory lock.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册