remove init_on_cpu from models (!4164) · 合并请求 · PaddlePaddle / models

remove init_on_cpu from models !4164

Created by: zhangting2020

init_on_cpu is used to force the variable to be inited on CPU or force the CPUkernel is executed. Variables related to the learning rate do not participate calculations in model, so there is no need to calculate on the GPU. However, the usage of init_on_cpu in the models does not take effect because it only works for some Ops with force_cpu Attr.

test case 1:

import paddle.fluid as fluid
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter

LEARNING_RATE = 0.01
TOTAL_STEP = 30000
POWER = 0.9

def poly_decay():
    global_step = _decay_step_counter()
    with init_on_cpu():
        decayed_lr = LEARNING_RATE * (fluid.layers.pow(
            (1 - global_step / TOTAL_STEP), POWER))
    return decayed_lr

lr = poly_decay()
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])

log1 In runtime, CUDAKernel for pow Op is executed.

I0106 08:42:36.204762  7559 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_0:float[1]({})]}, outputs:{Out[tmp_1:float[1]({})]}.
I0106 08:42:36.204845  7559 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:42:36.204968  7559 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:42:36.205127  7559 operator.cc:975] CUDAPlace(0) Op(pow), inputs:{FactorTensor[], X[tmp_1:float[1]({})]}, outputs:{Out[pow_0.tmp_0:float[1]({})]}.
I0106 08:42:36.205235  7559 executor_gc_helper.cc:166] Erase variable tmp_1

test case 2:

import math
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter

def cosine_decay_with_warmup(learning_rate, step_each_epoch, epochs=120):
    """Applies cosine decay to the learning rate.
    lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
    decrease lr for every mini-batch and start with warmup.
    """
    global_step = _decay_step_counter()
    lr = fluid.layers.tensor.create_global_var(
        shape=[1],
        value=0.0,
        dtype='float32',
        persistable=True,
        name="learning_rate")

    warmup_epoch = fluid.layers.fill_constant(
        shape=[1], dtype='float32', value=float(5), force_cpu=True)

    with init_on_cpu():
        epoch = ops.floor(global_step / step_each_epoch)
        with fluid.layers.control_flow.Switch() as switch:
            with switch.case(epoch < warmup_epoch):
                decayed_lr = learning_rate * (global_step /
                                              (step_each_epoch * warmup_epoch))
                fluid.layers.tensor.assign(input=decayed_lr, output=lr)
            with switch.default():
                decayed_lr = learning_rate * \
                    (ops.cos((global_step - warmup_epoch * step_each_epoch) * (math.pi / (epochs * step_each_epoch))) + 1)/2
                fluid.layers.tensor.assign(input=decayed_lr, output=lr)
    return lr

lr = cosine_decay_with_warmup(0.001, 1000)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])

log2

CUDAKernel for floor Op is executed.
CPUKernel for less_than Op is executed because this kernel is choosed by default.
CPUKernel for logical_not Op is executed because this kernel is choosed according to the place of input(X) that is output of less_than.
CUDAKernel for elementwise_div and assign are executed.

I0106 08:01:52.546391  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[cast_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_0:float[1]({})]}.
I0106 08:01:52.546515  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.546736  7295 operator.cc:975] CUDAPlace(0) Op(floor), inputs:{X[tmp_0:float[1]({})]}, outputs:{Out[floor_0.tmp_0:float[1]({})]}.
I0106 08:01:52.546887  7295 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:01:52.546960  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547022  7295 operator.cc:1155] Transform Variable floor_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CUDAPlace(0)]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547152  7295 scope.cc:169] Create variable floor_0.tmp_0
I0106 08:01:52.547196  7295 data_device_transform.cc:21] DeviceTransform in, src_place CUDAPlace(0) dst_place: CPUPlace
I0106 08:01:52.547309  7295 tensor_util.cu:129] TensorCopySync 1 from CUDAPlace(0) to CPUPlace
I0106 08:01:52.547488  7295 operator.cc:975] CPUPlace Op(less_than), inputs:{X[floor_0.tmp_0:float[1]({})], Y[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_1:bool[1]({})]}.
I0106 08:01:52.547566  7295 executor_gc_helper.cc:166] Erase variable floor_0.tmp_0
I0106 08:01:52.547648  7295 operator.cc:1060] expected_kernel_key:data_type[bool]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547739  7295 operator.cc:975] CPUPlace Op(logical_not), inputs:{X[tmp_1:bool[1]({})]}, outputs:{Out[logical_not_0.tmp_0:bool[1]({})]}.
I0106 08:01:52.547811  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547873  7295 operator.cc:1155] Transform Variable fill_constant_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547971  7295 scope.cc:169] Create variable fill_constant_0.tmp_0
I0106 08:01:52.548050  7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548118  7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548285  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_2:float[1]({})]}.
I0106 08:01:52.548383  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548444  7295 operator.cc:1155] Transform Variable cast_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548502  7295 scope.cc:169] Create variable cast_0.tmp_0
I0106 08:01:52.548607  7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548699  7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548949  7295 operator.cc:975] CUDAPlace(0) Op(elementwise_div), inputs:{X[cast_0.tmp_0:float[1]({})], Y[tmp_2:float[1]({})]}, outputs:{Out[tmp_3:float[1]({})]}.
I0106 08:01:52.549113  7295 executor_gc_helper.cc:166] Erase variable tmp_2
I0106 08:01:52.549221  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.549465  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_3:float[1]({})]}, outputs:{Out[tmp_4:float[1]({})]}.
I0106 08:01:52.549556  7295 executor_gc_helper.cc:166] Erase variable tmp_3
I0106 08:01:52.549700  7295 conditional_block_op.cc:64] Conditional block.idx = 1, scope = 0x7f1e8166e8f0
I0106 08:01:52.549901  7295 executor.cc:123] Creating Variables for block 1
I0106 08:01:52.550024  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.550144  7295 tensor_util.cu:34] TensorCopy 1 from CUDAPlace(0) to CUDAPlace(0)
I0106 08:01:52.550418  7295 operator.cc:975] CUDAPlace(0) Op(assign), inputs:{X[tmp_4:float[1]({})]}, outputs:{Out[learning_rate:float[1]({})]}.

PaddlePaddle / models 大约 2 年 前同步成功

remove init_on_cpu from models !4164

PaddlePaddle / models
大约 2 年前同步成功