1. 29 10月, 2021 5 次提交
    • B
      fix matmul error when input's dim is 3 (#36849) · f6b4ed22
      baoachun 提交于
      f6b4ed22
    • zhouweiwei2014's avatar
      add new API/OP: paddle.linalg.triangular_solve (#36714) · 92d6a048
      zhouweiwei2014 提交于
      * add new API: paddle.linalg.triangular_solve
      
      * add new API/OP: paddle.linalg.triangular_solve
      
      * add new API/OP: paddle.linalg.triangular_solve
      
      * fix comment
      92d6a048
    • L
      [new-exec] enable check_nan_inf (#36802) · be55bac3
      Leo Chen 提交于
      * enable check_nan_inf and fix variable scope
      
      * add ut
      
      * fix bug
      
      * update ut
      
      * revert doc change
      
      * fix npu compile
      be55bac3
    • F
      1. fix ifftshift(missing negative sign before shifts); (#36834) · f3ee5c99
      Feiyu Chan 提交于
      2. add complex data type support for paddle.shape at graph assembly.
      f3ee5c99
    • Y
      [Auto Parallel] Improve the interface and the underlying mechanisms (#36617) · a02532b5
      Yulong Ao 提交于
      * default dist op
      
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      
      * unify api
      
      * unify api
      
      * todo
      
      * update distop forward func
      
      * update distop forward func
      
      * auto parallel backward
      
      * update dist op
      
      * autoparallel backward
      
      * add backward for embedding
      
      * temp1
      
      * temp2
      
      * temp3
      
      * temp4
      
      * backward done1
      
      * backward done2
      
      * backward done3
      
      * dist embedding remove mp mode
      
      * dist matmul remove mp mode
      
      * update dist embedding
      『
      
      * dist op init1
      
      * dist op init 2
      
      * update unitest
      
      * context remove parallel mode
      
      * partitioner remove parallel mode
      
      * update unitest
      
      * a more general method to support varying mesh in pipeline parallel
      
      * support varying mesh in pipeline parallel
      
      * embedding support varying mesh in pipeline parallel
      
      * matmul support varying mesh in pipeline parallel
      
      * default dist op support varying mesh in pipeline parallel
      
      * dist attribute for startup program
      
      * default dist op support varying mesh in pipeline parallel 2
      
      * partitoner support varying mesh in pipeline parallel
      
      * revise logic for auto compeletion
      
      * revise framework.py
      
      * revise reshard unitest
      
      * revise unitest for parallelize
      
      * chmod
      
      * fixed bug for dist embedding name mapping
      
      * Improve the interface and the underlying mechanisms of auto parallel
      
      * revise completion for backward
      
      * revise completion for update
      
      * revise completion for update
      
      * update unitest
      
      * chmod
      
      * bugfix for grad_op output var's mesh
      
      * Modify codes for pr 36744
      
      * Remove unnecessary comments in framework.py
      
      * Remove unnecessary comments in completion.py
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
      a02532b5
  2. 28 10月, 2021 6 次提交
  3. 27 10月, 2021 12 次提交
  4. 26 10月, 2021 11 次提交
  5. 25 10月, 2021 6 次提交
    • A
      [NPU] modifications for model ernie-1.0 (#36642) · 19b02d95
      Aganlengzi 提交于
      * [NPU] modifications for model ernie-1.0
      
      * rollback 503003 and change cast to dtype
      19b02d95
    • Z
      add op: fused_feedforward(backward) (#35611) · 2dd0a46a
      zhangkaihuo 提交于
      这个PR是fused_feedforward反向的代码
      
      相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
      
      fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
      2dd0a46a
    • S
      Add bincount op (#36317) · 39f19127
      smallv0221 提交于
      * Add bincount op
      
      * upload cpu version
      
      * fix unitest
      
      * fix unittest
      
      * fix unittest
      
      * fix en doc
      
      * add more test
      
      * fix en doc
      
      * add more test case
      
      * fix test
      
      * fix input vailidation
      
      * fix input check
      
      * fix unittest
      
      * fix test
      
      * fix en doc
      39f19127
    • Z
      Create CinnCompiler class for compiling subgraphs found by build_cinn_pass. (#36562) · 4c460378
      Zhen Wang 提交于
      * Init the functions of CinnCompiler.
      
      * Add the unit test for CinnCompiler.
      
      * Fix some compilation errors.
      
      * Update the UT of cinn_compiler.
      
      * Use Decomposer&OpFusion passes in CinnCompiler::CompileGraph.
      
      * Update some comments.
      
      * Uncomment some includes in build_cinn_pass.cc.
      
      * Use refs instead of ptrs as returned types of FindGraph & Compile in
      CinnCompiler.
      
      * Use the merged CinnGraphSymbolization functions in CinnCompiler.
      4c460378
    • T
      add some ops to train ssd on kunlun (#36407) · 50778ad6
      TTerror 提交于
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update xpu cmake
      
      * update cast unittest
      50778ad6
    • Z
      add op: fused_feedforward(forward) (#35843) · b18cbfb2
      zhangkaihuo 提交于
      这个PR只包含fused_feedforward前向的代码。
      
      相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
      
      fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
      b18cbfb2