Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !11386

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Feature/memory transplier !11386

  • Report abuse
!11386 已关闭 6月 12, 2018 由 saxon_zh@saxon_zh 创建
#<User:0x00007fedf7cb6a00>
  • 概览 2
  • 提交 20
  • 变更 7

Created by: dzhwinter

A re-implement of memory transpiler. Main changes

  1. add in-place cache strategy. we provide the Reuse tag in OpProto, and memory transpiler will reuse the memory if the op can run in-place.

  2. more memory saving will be triggered, not only the shape equally block. Currently, our memory transpiler only reuses the shape equally memory block. I didn't figure out the reason why the previous implement does not fully support the bigger memory block. Obviously, if the memory block is bigger than we needed, we also can reuse it. https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/memory_optimization_transpiler.py#L189 https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/memory_optimization_transpiler.py#L252

  3. compute liveness set with a new algorithm. In our transpiler, an important step is that we need to compute the variable liveness set. However, when the op count goes higher, for example, the se-renext 152 has 4000 ops, the previous reach fixpoint algorithm cost long time to converge. This PR use the worklist algorithm, which converges faster than the reach fix point algorithm.

  4. The SSA-form graph optimized liveness algorithm. Given our memory strategy heavily relied on variable liveness range algorithm, so one idea is to generate more concrete variable liveness information. https://hal.inria.fr/inria-00558509v1/document image I follow this paper, try this SSA-graph based liveness algorithm in https://github.com/PaddlePaddle/Paddle/pull/11385 However, I found some issues when I implement this algorithm.
    first, the more concrete SSA-form liveness set is based on the hypothesis that loops and if condition happens everywhere. Because the loops and if condition can convert to phi variable*, which dominate the analysis later. If not, it works same as the normal liveness analysis algorithm. second, What's more, I do a forward analysis of our test program by hand(That means to compute the variable live period), and found that the result is same with the normal liveness analysis algorithm. third, In most ssa-graph application, the ssa program is the intermediate representation of user program. It's not easy to convert a ssa-form graph back to program desc.(Because I need to convert it back after finish the memory transpile). As a conclusion, I implement the worklist based liveness algorithm instead of the SSA-graph.

指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!11386
Source branch: github/fork/dzhwinter/feature/memory_transplier
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7