Created by: zhangting2020
init_on_cpu
is used to force the variable to be inited on CPU or force the CPUkernel is executed. Variables related to the learning rate do not participate calculations in model, so there is no need to calculate on the GPU. However, the usage of init_on_cpu
in the models does not take effect because it only works for some Ops with force_cpu Attr.
test case 1:
import paddle.fluid as fluid
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
LEARNING_RATE = 0.01
TOTAL_STEP = 30000
POWER = 0.9
def poly_decay():
global_step = _decay_step_counter()
with init_on_cpu():
decayed_lr = LEARNING_RATE * (fluid.layers.pow(
(1 - global_step / TOTAL_STEP), POWER))
return decayed_lr
lr = poly_decay()
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])
log1 In runtime, CUDAKernel for pow Op is executed.
I0106 08:42:36.204762 7559 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_0:float[1]({})]}, outputs:{Out[tmp_1:float[1]({})]}.
I0106 08:42:36.204845 7559 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:42:36.204968 7559 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:42:36.205127 7559 operator.cc:975] CUDAPlace(0) Op(pow), inputs:{FactorTensor[], X[tmp_1:float[1]({})]}, outputs:{Out[pow_0.tmp_0:float[1]({})]}.
I0106 08:42:36.205235 7559 executor_gc_helper.cc:166] Erase variable tmp_1
test case 2:
import math
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
def cosine_decay_with_warmup(learning_rate, step_each_epoch, epochs=120):
"""Applies cosine decay to the learning rate.
lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
decrease lr for every mini-batch and start with warmup.
"""
global_step = _decay_step_counter()
lr = fluid.layers.tensor.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=True,
name="learning_rate")
warmup_epoch = fluid.layers.fill_constant(
shape=[1], dtype='float32', value=float(5), force_cpu=True)
with init_on_cpu():
epoch = ops.floor(global_step / step_each_epoch)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(epoch < warmup_epoch):
decayed_lr = learning_rate * (global_step /
(step_each_epoch * warmup_epoch))
fluid.layers.tensor.assign(input=decayed_lr, output=lr)
with switch.default():
decayed_lr = learning_rate * \
(ops.cos((global_step - warmup_epoch * step_each_epoch) * (math.pi / (epochs * step_each_epoch))) + 1)/2
fluid.layers.tensor.assign(input=decayed_lr, output=lr)
return lr
lr = cosine_decay_with_warmup(0.001, 1000)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])
log2
- CUDAKernel for floor Op is executed.
- CPUKernel for less_than Op is executed because this kernel is choosed by default.
- CPUKernel for logical_not Op is executed because this kernel is choosed according to the place of input(X) that is output of less_than.
- CUDAKernel for elementwise_div and assign are executed.
I0106 08:01:52.546391 7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[cast_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_0:float[1]({})]}.
I0106 08:01:52.546515 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.546736 7295 operator.cc:975] CUDAPlace(0) Op(floor), inputs:{X[tmp_0:float[1]({})]}, outputs:{Out[floor_0.tmp_0:float[1]({})]}.
I0106 08:01:52.546887 7295 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:01:52.546960 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547022 7295 operator.cc:1155] Transform Variable floor_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CUDAPlace(0)]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547152 7295 scope.cc:169] Create variable floor_0.tmp_0
I0106 08:01:52.547196 7295 data_device_transform.cc:21] DeviceTransform in, src_place CUDAPlace(0) dst_place: CPUPlace
I0106 08:01:52.547309 7295 tensor_util.cu:129] TensorCopySync 1 from CUDAPlace(0) to CPUPlace
I0106 08:01:52.547488 7295 operator.cc:975] CPUPlace Op(less_than), inputs:{X[floor_0.tmp_0:float[1]({})], Y[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_1:bool[1]({})]}.
I0106 08:01:52.547566 7295 executor_gc_helper.cc:166] Erase variable floor_0.tmp_0
I0106 08:01:52.547648 7295 operator.cc:1060] expected_kernel_key:data_type[bool]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547739 7295 operator.cc:975] CPUPlace Op(logical_not), inputs:{X[tmp_1:bool[1]({})]}, outputs:{Out[logical_not_0.tmp_0:bool[1]({})]}.
I0106 08:01:52.547811 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547873 7295 operator.cc:1155] Transform Variable fill_constant_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547971 7295 scope.cc:169] Create variable fill_constant_0.tmp_0
I0106 08:01:52.548050 7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548118 7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548285 7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_2:float[1]({})]}.
I0106 08:01:52.548383 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548444 7295 operator.cc:1155] Transform Variable cast_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548502 7295 scope.cc:169] Create variable cast_0.tmp_0
I0106 08:01:52.548607 7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548699 7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548949 7295 operator.cc:975] CUDAPlace(0) Op(elementwise_div), inputs:{X[cast_0.tmp_0:float[1]({})], Y[tmp_2:float[1]({})]}, outputs:{Out[tmp_3:float[1]({})]}.
I0106 08:01:52.549113 7295 executor_gc_helper.cc:166] Erase variable tmp_2
I0106 08:01:52.549221 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.549465 7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_3:float[1]({})]}, outputs:{Out[tmp_4:float[1]({})]}.
I0106 08:01:52.549556 7295 executor_gc_helper.cc:166] Erase variable tmp_3
I0106 08:01:52.549700 7295 conditional_block_op.cc:64] Conditional block.idx = 1, scope = 0x7f1e8166e8f0
I0106 08:01:52.549901 7295 executor.cc:123] Creating Variables for block 1
I0106 08:01:52.550024 7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.550144 7295 tensor_util.cu:34] TensorCopy 1 from CUDAPlace(0) to CUDAPlace(0)
I0106 08:01:52.550418 7295 operator.cc:975] CUDAPlace(0) Op(assign), inputs:{X[tmp_4:float[1]({})]}, outputs:{Out[learning_rate:float[1]({})]}.