1. 29 10月, 2021 7 次提交
    • M
      113816d8
    • B
      fix matmul error when input's dim is 3 (#36849) · f6b4ed22
      baoachun 提交于
      f6b4ed22
    • zhouweiwei2014's avatar
      add new API/OP: paddle.linalg.triangular_solve (#36714) · 92d6a048
      zhouweiwei2014 提交于
      * add new API: paddle.linalg.triangular_solve
      
      * add new API/OP: paddle.linalg.triangular_solve
      
      * add new API/OP: paddle.linalg.triangular_solve
      
      * fix comment
      92d6a048
    • W
      fix some bug in new executor (#36822) · b5af9575
      wanghuancoder 提交于
      * fix some bug in new executor, test=develop
      
      * fix error message, test=develop
      b5af9575
    • L
      [new-exec] enable check_nan_inf (#36802) · be55bac3
      Leo Chen 提交于
      * enable check_nan_inf and fix variable scope
      
      * add ut
      
      * fix bug
      
      * update ut
      
      * revert doc change
      
      * fix npu compile
      be55bac3
    • F
      1. fix ifftshift(missing negative sign before shifts); (#36834) · f3ee5c99
      Feiyu Chan 提交于
      2. add complex data type support for paddle.shape at graph assembly.
      f3ee5c99
    • Y
      [Auto Parallel] Improve the interface and the underlying mechanisms (#36617) · a02532b5
      Yulong Ao 提交于
      * default dist op
      
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      
      * unify api
      
      * unify api
      
      * todo
      
      * update distop forward func
      
      * update distop forward func
      
      * auto parallel backward
      
      * update dist op
      
      * autoparallel backward
      
      * add backward for embedding
      
      * temp1
      
      * temp2
      
      * temp3
      
      * temp4
      
      * backward done1
      
      * backward done2
      
      * backward done3
      
      * dist embedding remove mp mode
      
      * dist matmul remove mp mode
      
      * update dist embedding
      『
      
      * dist op init1
      
      * dist op init 2
      
      * update unitest
      
      * context remove parallel mode
      
      * partitioner remove parallel mode
      
      * update unitest
      
      * a more general method to support varying mesh in pipeline parallel
      
      * support varying mesh in pipeline parallel
      
      * embedding support varying mesh in pipeline parallel
      
      * matmul support varying mesh in pipeline parallel
      
      * default dist op support varying mesh in pipeline parallel
      
      * dist attribute for startup program
      
      * default dist op support varying mesh in pipeline parallel 2
      
      * partitoner support varying mesh in pipeline parallel
      
      * revise logic for auto compeletion
      
      * revise framework.py
      
      * revise reshard unitest
      
      * revise unitest for parallelize
      
      * chmod
      
      * fixed bug for dist embedding name mapping
      
      * Improve the interface and the underlying mechanisms of auto parallel
      
      * revise completion for backward
      
      * revise completion for update
      
      * revise completion for update
      
      * update unitest
      
      * chmod
      
      * bugfix for grad_op output var's mesh
      
      * Modify codes for pr 36744
      
      * Remove unnecessary comments in framework.py
      
      * Remove unnecessary comments in completion.py
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
      a02532b5
  2. 28 10月, 2021 12 次提交
  3. 27 10月, 2021 18 次提交
  4. 26 10月, 2021 3 次提交
    • J
      Remove additional warnning in layer.to (#36700) · 63f1e6bd
      Jiabin Yang 提交于
      * remove additional warnning in layer.to
      
      * remove additional warnning in layer.to
      
      * remove additional warnning in layer.to
      
      * remove additional warnning in layer.to
      
      * remove additional warnning in layer.to
      63f1e6bd
    • L
      Add fused attention op backward and python layer. (#36498) · 5119428e
      Li Min 提交于
      功能:本PR的目标是提高attention模块的计算性能。
      为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
      为了减少防存开销,本PR采取了两种优化方法:
      (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
      (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
      5119428e
    • F
      roll_op: support Tensor as input for shifts (#36727) · 7b1e30fc
      Feiyu Chan 提交于
      7b1e30fc