Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • 合并请求
  • !4164

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

remove init_on_cpu from models !4164

  • Report abuse
!4164 已合并 1月 06, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007f0e6e852f68>
  • 概览 1
  • 提交 1
  • 变更 5

Created by: zhangting2020

init_on_cpu is used to force the variable to be inited on CPU or force the CPUkernel is executed. Variables related to the learning rate do not participate calculations in model, so there is no need to calculate on the GPU. However, the usage of init_on_cpu in the models does not take effect because it only works for some Ops with force_cpu Attr.

test case 1:

import paddle.fluid as fluid
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter

LEARNING_RATE = 0.01
TOTAL_STEP = 30000
POWER = 0.9

def poly_decay():
    global_step = _decay_step_counter()
    with init_on_cpu():
        decayed_lr = LEARNING_RATE * (fluid.layers.pow(
            (1 - global_step / TOTAL_STEP), POWER))
    return decayed_lr

lr = poly_decay()
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])

log1 In runtime, CUDAKernel for pow Op is executed.

I0106 08:42:36.204762  7559 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_0:float[1]({})]}, outputs:{Out[tmp_1:float[1]({})]}.
I0106 08:42:36.204845  7559 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:42:36.204968  7559 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:42:36.205127  7559 operator.cc:975] CUDAPlace(0) Op(pow), inputs:{FactorTensor[], X[tmp_1:float[1]({})]}, outputs:{Out[pow_0.tmp_0:float[1]({})]}.
I0106 08:42:36.205235  7559 executor_gc_helper.cc:166] Erase variable tmp_1

test case 2:

import math
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter

def cosine_decay_with_warmup(learning_rate, step_each_epoch, epochs=120):
    """Applies cosine decay to the learning rate.
    lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
    decrease lr for every mini-batch and start with warmup.
    """
    global_step = _decay_step_counter()
    lr = fluid.layers.tensor.create_global_var(
        shape=[1],
        value=0.0,
        dtype='float32',
        persistable=True,
        name="learning_rate")

    warmup_epoch = fluid.layers.fill_constant(
        shape=[1], dtype='float32', value=float(5), force_cpu=True)

    with init_on_cpu():
        epoch = ops.floor(global_step / step_each_epoch)
        with fluid.layers.control_flow.Switch() as switch:
            with switch.case(epoch < warmup_epoch):
                decayed_lr = learning_rate * (global_step /
                                              (step_each_epoch * warmup_epoch))
                fluid.layers.tensor.assign(input=decayed_lr, output=lr)
            with switch.default():
                decayed_lr = learning_rate * \
                    (ops.cos((global_step - warmup_epoch * step_each_epoch) * (math.pi / (epochs * step_each_epoch))) + 1)/2
                fluid.layers.tensor.assign(input=decayed_lr, output=lr)
    return lr

lr = cosine_decay_with_warmup(0.001, 1000)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[lr])

log2

  • CUDAKernel for floor Op is executed.
  • CPUKernel for less_than Op is executed because this kernel is choosed by default.
  • CPUKernel for logical_not Op is executed because this kernel is choosed according to the place of input(X) that is output of less_than.
  • CUDAKernel for elementwise_div and assign are executed.
I0106 08:01:52.546391  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[cast_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_0:float[1]({})]}.
I0106 08:01:52.546515  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.546736  7295 operator.cc:975] CUDAPlace(0) Op(floor), inputs:{X[tmp_0:float[1]({})]}, outputs:{Out[floor_0.tmp_0:float[1]({})]}.
I0106 08:01:52.546887  7295 executor_gc_helper.cc:166] Erase variable tmp_0
I0106 08:01:52.546960  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547022  7295 operator.cc:1155] Transform Variable floor_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CUDAPlace(0)]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547152  7295 scope.cc:169] Create variable floor_0.tmp_0
I0106 08:01:52.547196  7295 data_device_transform.cc:21] DeviceTransform in, src_place CUDAPlace(0) dst_place: CPUPlace
I0106 08:01:52.547309  7295 tensor_util.cu:129] TensorCopySync 1 from CUDAPlace(0) to CPUPlace
I0106 08:01:52.547488  7295 operator.cc:975] CPUPlace Op(less_than), inputs:{X[floor_0.tmp_0:float[1]({})], Y[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_1:bool[1]({})]}.
I0106 08:01:52.547566  7295 executor_gc_helper.cc:166] Erase variable floor_0.tmp_0
I0106 08:01:52.547648  7295 operator.cc:1060] expected_kernel_key:data_type[bool]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
I0106 08:01:52.547739  7295 operator.cc:975] CPUPlace Op(logical_not), inputs:{X[tmp_1:bool[1]({})]}, outputs:{Out[logical_not_0.tmp_0:bool[1]({})]}.
I0106 08:01:52.547811  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547873  7295 operator.cc:1155] Transform Variable fill_constant_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.547971  7295 scope.cc:169] Create variable fill_constant_0.tmp_0
I0106 08:01:52.548050  7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548118  7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548285  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[fill_constant_0.tmp_0:float[1]({})]}, outputs:{Out[tmp_2:float[1]({})]}.
I0106 08:01:52.548383  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548444  7295 operator.cc:1155] Transform Variable cast_0.tmp_0 from data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.548502  7295 scope.cc:169] Create variable cast_0.tmp_0
I0106 08:01:52.548607  7295 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
I0106 08:01:52.548699  7295 tensor_util.cu:129] TensorCopySync 1 from CPUPlace to CUDAPlace(0)
I0106 08:01:52.548949  7295 operator.cc:975] CUDAPlace(0) Op(elementwise_div), inputs:{X[cast_0.tmp_0:float[1]({})], Y[tmp_2:float[1]({})]}, outputs:{Out[tmp_3:float[1]({})]}.
I0106 08:01:52.549113  7295 executor_gc_helper.cc:166] Erase variable tmp_2
I0106 08:01:52.549221  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.549465  7295 operator.cc:975] CUDAPlace(0) Op(scale), inputs:{ScaleTensor[], X[tmp_3:float[1]({})]}, outputs:{Out[tmp_4:float[1]({})]}.
I0106 08:01:52.549556  7295 executor_gc_helper.cc:166] Erase variable tmp_3
I0106 08:01:52.549700  7295 conditional_block_op.cc:64] Conditional block.idx = 1, scope = 0x7f1e8166e8f0
I0106 08:01:52.549901  7295 executor.cc:123] Creating Variables for block 1
I0106 08:01:52.550024  7295 operator.cc:1060] expected_kernel_key:data_type[float]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
I0106 08:01:52.550144  7295 tensor_util.cu:34] TensorCopy 1 from CUDAPlace(0) to CUDAPlace(0)
I0106 08:01:52.550418  7295 operator.cc:975] CUDAPlace(0) Op(assign), inputs:{X[tmp_4:float[1]({})]}, outputs:{Out[learning_rate:float[1]({})]}.
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/models!4164
Source branch: github/fork/zhangting2020/init_on_cpu
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7