1. 17 9月, 2021 3 次提交
    • Z
      Make flag adding easier (#35823) · 2c781455
      Zeng Jinle 提交于
      * make flag setter easier
      
      * update
      
      * rename macro name
      
      * fix bug of public/writable
      
      * update to pass CI
      
      * polish
      
      * fix CPU link error
      2c781455
    • L
      expose cuda stream to users (#35813) · 40cfa512
      Leo Chen 提交于
      * expose cuda stream to users
      
      * add ut
      40cfa512
    • W
      GeneratePass for Python Pass (#35708) · f6db9806
      wuhuanzhou 提交于
      #### 背景
      
      #35602 提供Python侧开发子图替换类Pass的方式:
      
      - 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图;
      - Python侧注册Pass时,将注册函数最终转换为protobuf定义的PassDesc数据形式,供C++侧进行解析完成Pass实例注册。
      
      本PR即为根据PassDesc规则描述解析生成Pass实例。
      
      #### 方案设计
      
      ##### Pass规则验证
      
      在以往的Pass开发中,会存在随着算子迭代引发的匹配失效或者错误匹配的问题,该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。
      
      当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足:由于以往Pass开发在运行时才能获取到pattern信息,所以需要在执行Pass时才可以判断。
      
      使用PassDesc表示的Pass可以在执行Pass前验证上述问题,这个过程在VerifyDesc中完成。
      
      ##### 根据匹配子图构造pattern
      
      GeneratePass对于图匹配和替换使用GraphPatternDecetor完成,构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成,该函数没有作为GeneratePass的成员方法,主要出于后续可能开发新的Decetor考虑,GeneratePass与Decetor的操作是没有关联的。
      
      初始化pattern主要通过遍历匹配子图program的全部算子实现:
      
      1. 添加当前算子对应PDNode及限制条件(算子类型、属性限制等);
      2. 遍历当前算子对应输入并从pattern中尝试获取PDNode:
         - 在pattern中获取到PDNode且为输出节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点;
         - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输入节点;
         - 设置输入到算子的边关系;
      3. 遍历当前算子对应输出:
         - 在pattern中获取到PDNode且为输入节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点;
         - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输出节点;
         - 设置算子到输出的边关系;
      
      ##### 根据替换子图操作graph
      
      替换子图操作的过程在`GetGenerateRewrite`函数中完成,与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。
      
      生成替换子图操作过程如下:
      
      1. 判断冗余替换子图;
      2. 遍历替换子图program的全部算子添加替换子图Node:
         1. 添加当前算子的Node及属性设置;
         2. 遍历当前算子对应输入,添加中间variable节点;
         3. 遍历当前算子对应输出,添加中间variable节点;
         4. 添加输入/输出节点与算子节点的边关系;
      3. 删除匹配图中属于中间节点的Node;
      
      ##### 优化子图验证
      
      对于替换子图或者替换后的计算图是否可以正确运行等,可以在执行Pass时验证,从而防止在后续执行计算图时出现异常。
      
      当前Pass执行直接修改计算图,验证失败时无法很好的完成还原操作,目前子图验证暂时默认成功,留到后续改进。
      f6db9806
  2. 16 9月, 2021 2 次提交
  3. 15 9月, 2021 4 次提交
  4. 14 9月, 2021 2 次提交
  5. 11 9月, 2021 1 次提交
  6. 10 9月, 2021 1 次提交
  7. 09 9月, 2021 1 次提交
    • 0
      Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12
      0x45f 提交于
      * init matrix_rank op, add matrix_rank CPU code and test
      
      * add GPU kernel, remove svd_eigen.h
      
      * add CPU kernel when tol is tensor
      
      * add cpu and gpu code when tol is tensor
      
      * fix CI-ROCM error
      
      * add matrix_rank API describe, fix PR-CI-Py3 error
      
      * fix PR-CI-Windows error, add matrix_rank API test
      
      * delete useless comments
      
      * fix review
      
      * add my code in svd_helper.h
      
      * update doc commets
      
      * remove spaces
      eb1fbf12
  8. 08 9月, 2021 4 次提交
  9. 07 9月, 2021 1 次提交
  10. 06 9月, 2021 1 次提交
  11. 04 9月, 2021 1 次提交
  12. 02 9月, 2021 1 次提交
  13. 01 9月, 2021 1 次提交
    • Z
      Support settiem by Bool index (#35133) · d387820d
      zyfncg 提交于
      * Support getitem by Bool index
      
      * delete some debug info of bool index
      
      * support the case that the shape of bool index is different from indexed tensor
      
      * support setitem by bool index
      
      * add the unittest for throwing exception
      
      * merge conflict
      
      * add check for int tensor when index is bool
      d387820d
  14. 31 8月, 2021 2 次提交
  15. 30 8月, 2021 1 次提交
  16. 27 8月, 2021 2 次提交
  17. 26 8月, 2021 3 次提交
    • S
      Add paddle.utils.dlpack APIs (#35067) · 8dc050d8
      Siming Dai 提交于
      * add dlpack api and fix a from_dlpack 
      8dc050d8
    • W
      support tensor index. (#34824) · e7df47ec
      WeiXin 提交于
      * polish code
      
      * polish code.
      
      * polish code.
      
      * polish code.
      
      * polish code.
      e7df47ec
    • S
      Add copy from tensor (#34406) · ac33c0ca
      Shang Zhizhou 提交于
      * add api
      
      * temp save
      
      * revert
      
      * copytocpu async ok
      
      * fix style
      
      * copy sync ok
      
      * fix compile error
      
      * fix compile error
      
      * api done
      
      * update python async api
      
      * fix compile
      
      * remove async python api; add c++ async unittest
      
      * remove python async api
      
      * update unittest
      
      * update unittest
      
      * add C++ unittest for copytensor
      
      * add unittest
      
      * update namespace utils to class TensorUtils
      
      * add unittest
      
      * update unittest
      
      * update unittest
      
      * update code style
      
      * update code style
      
      * update unittest
      ac33c0ca
  18. 25 8月, 2021 1 次提交
    • L
      fix potential tensor leak in tensor.__setitem__ (#35013) · 763b6d91
      Leo Chen 提交于
      * fix index tensor leak in __setitem__
      
      * fix another usage of PyTuple_Pack
      
      * refine code
      
      * refine code
      
      * handle None index
      
      * add Py_DecRef
      
      * revert ut
      
      * refine code
      
      * merge develop
      
      * use RAII
      
      * follow comments
      763b6d91
  19. 24 8月, 2021 2 次提交
    • W
      add fetch, test=develop (#35019) · a5060b55
      wanghuancoder 提交于
      * add fetch, test=develop
      
      * fix fetch2op, test=develop
      
      * fix fetch2op, test=develop
      
      * refine, test=develop
      
      * fix fetch ctx, test=develop
      
      * add wait, test=develop
      
      * rename fetch2 to fetch_v2, test=develop
      
      * merge, test=develop
      a5060b55
    • Y
      Add auto completion module for auto parallel (#34813) · 93d862b0
      Yulong Ao 提交于
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * add dist
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * delete unused proto
      
      * resotre op_desc
      
      * restore type_defs
      
      * update var_desc
      
      * remove dimss_mapping for proto_pybind
      
      * update interface.py
      
      * update framework.py
      
      * update
      
      * update
      
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * [WIP] Add the auto completion feature and related codes
      
      * [WIP] Improve the auto completion and related codes
      
      * [WIP] Make the auto completion to support data-parallel
      
      * [WIP] Make the completion support mp and dp+mp
      
      * [WIP] Refactor auto completion unit test for MLP
      
      * [WIP] Refactor the implementation of DistributedOperatorImpl
      
      * [WIP] Improve dims_mapping update rule and fix a bug
      
      * [WIP] Support auto completion for one transformer decoder layer
      
      * [WIP] Add a minor change
      
      * [WIP] Fix a bug within the uint test
      
      * Shard XShape tensor, add embedding completion and refactor code
      
      * Add the distributed_operators dir to setup.py.in
      
      * Improve the completion process and add the unittest for gpt
      
      * fix process_mesh ut
      
      * fix process_mesh ut
      
      * update
      
      * update, test=develop
      
      * Add support for automatically completing distributed attrs of special ops
      
      * update
      
      * update
      
      * update
      
      * fix doc sample codes, test=develop
      
      * improve coverage, test=develop
      
      * add static_mode check, test=develop
      
      * Model the cluster for cost model and physical mapping
      
      * update, test=develop
      
      * add set_placement, test=develop
      
      * Add the check to make sure the candidate tensors' size is great than zero
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update, test=develop
      
      * Auto mark dist attrs annotated by user
      
      * update ndarray to nested list, test=develop
      
      * update, test=develop
      
      * Add auto-completion module for auto-parallel (based on PR#33804)
      
      * Remove unnecessary files
      
      * Remove unrelated files for the auto completion pr
      
      * Update the unit test to improve the coverage
      
      * Modify codes based on reviews
      
      * Minor changes for CI
      
      * Improve some codes based on new comments
      
      * Fix bugs caused by shallow copy in attributes.py
      * Imporve amend_distributed_attr_for_program in context.py
      * Other changes for weihang's comments
      Co-authored-by: Nsandyhouse <lilong12@baidu.com>
      93d862b0
  20. 23 8月, 2021 4 次提交
  21. 19 8月, 2021 1 次提交
  22. 18 8月, 2021 1 次提交