memory_en.rst 10.8 KB
Newer Older
1 2 3 4 5

memory management
==================


Z
zq19 已提交
6
FLAGS_allocator_strategy
7 8 9
**************************************
(since 1.2)

10
Use to choose allocator strategy of PaddlePaddle.
11 12 13

Values accepted
---------------
14
String, enum in ['naive_best_fit', 'auto_growth']. The default value will be 'naive_best_fit' if users compile PaddlePaddle with -DON_INFER=ON CMake flag, otherwise is 'auto_growth'. The default PaddlePaddle pip package uses 'auto_growth'.
15 16 17

Example
--------
18
FLAGS_allocator_strategy=naive_best_fit would use the pre-allocated best fit allocator. 'naive_best_fit' strategy would occupy almost all GPU memory by default but leads to less memory fragmentation (i.e., maximum batch size of models may be larger).
19

20
FLAGS_allocator_strategy=auto_growth would use the auto growth allocator. 'auto_growth' strategy would allocate GPU memory on demand but may lead to more memory fragmentation (i.e., maximum batch size of models may be smaller).
21 22 23



Z
zq19 已提交
24
FLAGS_eager_delete_scope
25 26 27 28 29 30 31 32 33 34 35 36 37 38
*******************************************
(since 0.12.0)

Make scope delete synchronously. If set, it will reduce GPU memory usage but slow down the destruction of variables (around 1% performance harm).

Values accepted
---------------
Bool. The default value is True.

Example
-------
FLAGS_eager_delete_scope=True will make scope delete synchronously.


Z
zq19 已提交
39
FLAGS_eager_delete_tensor_gb
40 41 42
*******************************************
(since 1.0.0)

43
Whether to use garbage collection strategy to optimize the memory usage of network. If FLAGS_eager_delete_tensor_gb < 0, garbage collection strategy is disabled. If FLAGS_eager_delete_tensor_gb >= 0, garbage collection strategy would be enabled, and collect memory garbages when running network, which is beneficial to saving memory usage. It is only useful when you use Executor to run program, or compile program, or compile program with data parallel. Garbage collector would not release memory garbages until the memory size of garbages reaches FLAGS_eager_delete_tensor_gb GB.
44 45 46

Values accepted
---------------
47
Double, in GB unit. The default value is 0.0.
48 49 50

Example
-------
51
FLAGS_eager_delete_tensor_gb=0.0 would make memory garbage release till the memory size of garbages reaches 0.0GB, i.e., release immediately once there is any garbage.
52 53 54 55 56 57 58 59 60 61

FLAGS_eager_delete_tensor_gb=1.0 would make memory garbage release till the memory size of garbages reaches 1.0GB. 

FLAGS_eager_delete_tensor_gb=-1.0 would disable garbage collection strategy.

Note
-------
It is recommended that users enable garbage collection strategy by setting FLAGS_eager_delete_tensor_gb=0.0 when training large network.


62
FLAGS_fast_eager_deletion_mode
63
*******************************************
64
(since 1.3)
65

66
Whether to use fast garbage collection strategy. If not set, gpu memory would be released when CUDA kernel ends. Otherwise, gpu memory would be released without waiting CUDA kernel ends, making garbage collection strategy faster. Only valid when garbage collection strategy is enabled.
67 68 69

Values accepted
---------------
70
Bool. The default value is True.
71 72 73

Example
-------
74
FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strategy. 
75

76
FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.
77

78
FLAGS_fraction_of_cpu_memory_to_use
79
*******************************************
80
(since 1.2.0)
81

82
Allocate a chunk of cpu memory that is this fraction of the total cpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough cpu memory, additional chunks of the same size will be requested from cpu until the cpu has no memory left for another chunk.
83 84 85

Values accepted
---------------
86
Double value in range [0, 1] which is the initial CPU memory percentage. The default value is 1.0.
87 88 89

Example
-------
90
FLAGS_fraction_of_cpu_memory_to_use=0.1 will allocate 10% total cpu memory size as initial CPU chunk.
91 92


93
FLAGS_fraction_of_cuda_pinned_memory_to_use
94 95 96
*******************************************
(since 1.2.0)

97
Allocate a chunk of CUDA pinned memory that is this fraction of the total cpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough cpu memory, additional chunks of the same size will be requested from cpu until the cpu has no memory left for another chunk.
98 99 100

Values accepted
---------------
101
Double value in range [0, 1] which is the initial CUDA pinned memory percentage. The default value is 0.5.
102 103 104

Example
-------
105
FLAGS_fraction_of_cuda_pinned_memory_to_use=0.1 will allocate 10% total cpu memory size as initial CUDA Pinned chunk.
106 107


108
FLAGS_fraction_of_gpu_memory_to_use
109
*******************************************
110
(since 1.2.0)
111

112
Allocate a chunk of gpu memory that is this fraction of the available gpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the same size will be requested from gpu until the gpu has no memory left for another chunk.
113 114 115

Values accepted
---------------
116
Double value in range [0, 1] which is the initial GPU memory percentage.
117 118 119

Example
-------
120
FLAGS_fraction_of_gpu_memory_to_use=0.1 will allocate 10% available gpu memory size as initial GPU chunk.
121

122 123 124 125
Note
-------
Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by default.
Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.
126 127


Z
zq19 已提交
128
FLAGS_fuse_parameter_groups_size
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
*******************************************
(since 1.4.0)

FLAGS_fuse_parameter_groups_size is the size of one group parameters' gradient. The default value is an empirical result. If the fuse_parameter_groups_size is 1, it means that the groups' size is the number of parameters' gradient. If the fuse_parameter_groups_size is -1, it means that there is only one group. The default value is 3, it is an empirical value.

Values accepted
---------------
Int32. The default value is 3.

Example
-------
FLAGS_fuse_parameter_groups_size=3 will set the size of one group parameters' gradient to 3.



Z
zq19 已提交
144
FLAGS_fuse_parameter_memory_size
145 146 147 148 149 150 151 152 153 154 155 156 157 158
*******************************************
(since 1.5.0)

FLAGS_fuse_parameter_memory_size indicates the up limited memory size of one group parameters' gradient which is the input of communication calling ( e.g NCCLAllReduce). The default value is -1.0, it means that not set group according to memory_size. The unit is Megabyte.

Values accepted
---------------
Double. The default value is -1.0.

Example
-------
FLAGS_fuse_parameter_memory_size=16 set the up limited memory size of one group parameters' gradient to 16 Megabytes.


Z
zq19 已提交
159
FLAGS_init_allocated_mem
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
*******************************************
(since 0.15.0)

Whether to initialize the allocated memory by some non-zero values. This flag is for debug use to prevent that some ops assumes that the memory allocated is initialized to be zero.

Values accepted
---------------
Bool. The default value is False.

Example
-------
FLAGS_init_allocated_mem=True will make the allocated memory initialize as a non-zero value. 

FLAGS_init_allocated_mem=False will not initialize the allocated memory.


Z
zq19 已提交
176
FLAGS_initial_cpu_memory_in_mb
177 178 179 180 181 182 183 184 185 186 187 188 189 190
*******************************************
(since 0.14.0)

Initial CPU memory chunk size in MB of PaddlePaddle allocator. Allocator would take the minimal value of FLAGS_initial_cpu_memory_in_mb and FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) as the memory chunk size.

Values accepted
---------------
Uint64. The default value is 500 with unit MB.

Example
-------
FLAGS_initial_cpu_memory_in_mb=100, if FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) > 100MB, then allocator will pre-allocate 100MB when first allocation request raises, and re-allocate 100MB again when the pre-allocated memory is exhaustive.


Z
zq19 已提交
191
FLAGS_initial_gpu_memory_in_mb
192 193 194
*******************************************
(since 1.4.0)

195
Allocate a chunk of GPU memory whose byte size is specified by the flag. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough GPU memory, additional chunks of the GPU memory will be requested from GPU with size specified by FLAGS_reallocate_gpu_memory_in_mb until the GPU has no memory left for the additional chunk.
196 197 198

Values accepted
---------------
199
Uint64 value greater than 0 which is the initial GPU memory size in MB. 
200 201 202 203 204 205 206

Example
-------
FLAGS_initial_gpu_memory_in_mb=4096 will allocate 4 GB as initial GPU chunk.

Note
-------
207 208
If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use will be overrided by this flag, PaddlePaddle will allocate the initial gpu memory with size specified by this flag.
If you don't set this flag, the dafault value 0 will disable this GPU memory strategy. PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate the initial GPU chunk.
209 210 211



Z
zq19 已提交
212
FLAGS_memory_fraction_of_eager_deletion
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231
*******************************************
(since 1.4)

A memory size percentage when garbage collection strategy decides which variables should be released. If FLAGS_memory_fraction_of_eager_deletion=1.0, all temporary variables in the network would be released. If FLAGS_memory_fraction_of_eager_deletion=0.0, all temporary variables in the network would not be released. If 0.0<FLAGS_memory_fraction_of_eager_deletion<1.0, all temporary variables would be sorted descendingly according to their memory size, and only 
FLAGS_memory_fraction_of_eager_deletion of variables with largest memory size would be released. This flag is only valid when running compiled program with data parallel.

Values accepted
---------------
Double, inside [0.0, 1.0]. The default value is 1.0.

Example
-------
FLAGS_memory_fraction_of_eager_deletion=0 would keep all temporary variables, that is to say, disabling garbage collection strategy.

FLAGS_memory_fraction_of_eager_deletion=1 would release all temporary variables.  
  
FLAGS_memory_fraction_of_eager_deletion=0.5 would only release 50% of variables with largest memory size.


Z
zq19 已提交
232
FLAGS_reallocate_gpu_memory_in_mb
233 234 235 236 237 238 239
*******************************************
(since 1.4.0)

Re-allocate additional GPU chunk if run out of allocated GPU memory chunk.

Values accepted
---------------
240
Int64 value greater than 0 in MB which is the re-allocated GPU memory size in MB
241 242 243 244 245 246 247

Example
-------
FLAGS_reallocate_gpu_memory_in_mb=1024 will re-allocate 1 GB if run out of GPU memory chunk.

Note
-------
248 249
If this flag is set, the memory size set by FLAGS_fraction_of_gpu_memory_to_use will be overrided by this flag, PaddlePaddle will re-allocate the gpu memory with size specified by this flag.
If you don't set this flag, the dafault value 0 will disable this GPU memory strategy. PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to re-allocate GPU memory.
250 251


Z
zq19 已提交
252
FLAGS_use_pinned_memory
253 254 255 256 257 258 259 260 261 262 263
*******************************************
(since 0.12.0)

Whether to use cpu pinned memory. If set, CPU allocator calls mlock to lock pages.

Values accepted
---------------
Bool. The default value is True.

Example
-------
264
FLAGS_use_pinned_memory=True would make the pages of allocated cpu memory lock.