Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #8016

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 2月 01, 2018 by saxon_zh@saxon_zhGuest

Need the ability to group operations together. Similar to collections in Tensorflow

Created by: abhinavarora

This problem came into light when I was investigating the method we could use to move regularization to Pserver (https://github.com/PaddlePaddle/Paddle/issues/7432). The current distributed transpiler splits the params and the grads and passes different slices to each pserver. Hence, when we create optimize ops, we use sliced parameters and gradients. However, the distribute transpiler currently does this through a hack. The transpiler identifies these ops by checking if the op contains inputs called Param and Grad. This works well because optimize ops have their own dedicated operations called sgd_op, adam_op, etc.

However, in case of regularization and gradient clipping, we rely on generic tensor ops like scale and elementwise_add. These ops take as inputs the parameters and thus on the pserver they should take the sliced parameters as inputs. Thus we need a way to identify these ops in the distribute transpiler, so that we can make sure that we pass the sliced params and grads as inputs to them. The above-mentioned hack will not work for this case because these are generic ops which have input and output names like X, Y, etc.

A hacky solution would be to create dedicated ops for regularization. Currently, regularization layer adds a scale and an elementwise_add op in Python. Instead, we could create a separate op which composes these 2 ops in C++.

A better and a more sustainable solution would be to support adding tags to Python ops. This could allow us to group ops of similar tags. In this way, we can make sure that all the ops that are added for regularization carry a regularization tag. Similarly, gradient clipping ops carry a tag. The distribute transpiler can then process the ops by tag and apply whatever slicing logic it needs to apply to them. These tags are similar to the concept of Collections in Tensorflow.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#8016
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7