Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !25900

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

[Feature] Add buffer shared indentity inplace pass !25900

  • Report abuse
!25900 开放中 8月 03, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007fedf5d86720>
  • 概览 1
  • 提交 7
  • 变更 15

Created by: zhiqiu

PR types

New features

PR changes

Others

Describe

Merge PR #21242: Add BufferSharedIdentityInplaceOpPass

Details from #21242

Background

This PR enhances in-place strategy further.

Some operators do not change the input data when in-place is performed. These operators (we call them "identity ops") include:

  • reshape, reshape_grad and reshape_grad_grad
  • squeeze and squeeze_grad
  • unsqueeze and unsqueeze_grad
  • flatten and flatten_grad
  • assign and its grad (the grad of assign is assign itself)
  • ...

Suppose that X is input of op1 and op2, where op1 is an identity op and op2 is a non-identity op. In-place can be performed in op1 safely because op1 would not change the data of X even though in-place is performed. This PR adds BufferSharedIdentityInplaceOpPass to enable this kind of in-place strategy.

For-example,

x2 = fluid.layers.reshape(x1, ...)
x3 = non_identity_op(x1)

Although x1 is input of two ops, in-place can be performed safely when running reshape.

Design of BufferSharedIdentityInplaceOpPass

1. Do not consider last lived ops only

Suppose that we have the network like:

x2 = fluid.layers.reshape(x1, ...)
x3 = op2(x2)
x4 = op3(x1, x3)

It is obvious that the last lived ops of x1 are [op3] only (because reshape is strictly before op3 in the graph). However, since the last version of x1 would be only read in reshape and op3, x1 can be still identity inplaced in this case. Therefore, in the implementation of this PR, we would check whether the last lived ops of x1 only read x1. If so, we would scan all ops that read x1 to find out the identity inplace ops instead of only scanning last lived ops of x1.

2. Non-branched identity inplace and branched identity inplace

There are two kinds of identity inplace reuse:

  • Non-branched identity inplace: input X may be only reused by one output var, such as:
X -> reshape -> Y1 -> squeeze -> Y2 -> op1 -> ...

In-place can be performed in reshape and squeeze no matter op1 is a non-identity inplace op (i.e, relu) or not.

  • Branched identity inplace: input X may be reused by many output vars. Branched non-identity inplace is not allowed, since these ops would change data of input, but branched identity inplace is allowed. For example,
       -> reshape -> Y1 -> op1 -> ...
X -  |
       -> squeeze -> Y2 -> op2 -> ...

If op1 is an inplace relu, inplace of reshape should not happen! Therefore, when we record the identity inplace ops, we should also know whether the leaf vars of the branched inplace tree would be non-identity inplaced or not. In the implementation of this PR, we would record all the leaf var nodes that are non-identity inplaced and prune them when branched identity inplace happens.

3. Mark some leaf vars to be non-reusable to avoid further reuse error

There are two cases when the leaf vars should not be reused further (i.e, reused inside BufferSharedCrossOpMemoryReusePass):

  • Branched identity inplace happens. If any leaf var is reused by another vars, calculation result may be wrong.
  • If the last lived ops of X is not a subset of all identity inplace ops. In this case, the last lived ops of X may read the data of X after its data is changed by other memory reuse process. Calculation result may be wrong too.

Performance

memory

We evaluate the max batch_size of transformer model on single v100 card. The default allocator strategy is auto_growth, with gc and inlpace enabled.

dev pr
model with reshape(inplace=True) 10323 10322
model with reshape(inplace=False) 10326 10424

From the table above, we can conclude that,

  • It helps little to set fluid.layers.reshape(inplace=True) on transformer model, so we can remove inplace parameter in reshape.
  • The BufferSharedIdentityInplaceOpPass is able to increase max batch_size on transformer model, from 10323->10424

speed

We evaluate training speed of transformer model on single v100 card. The default allocator strategy is auto_growth, with gc and inlpace enabled, batch_size is 10250.

dev (step/s) pr (step/s)
model with reshape(inplace=True) 2.23 2.21858
model with reshape(inplace=False) 2.23568 2.20041

From the table above, we can conclude that,

  • With reshape(inplace=True), the training of develop version speed up from 2.18 to 2.21, about 1.4%.
  • With this PR, the training of develop version speed up from -
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!25900
Source branch: github/fork/zhiqiu/add_buffer_shared_indentity_inplace_pass
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7