1. 07 3月, 2022 17 次提交
    • 0
      [Phi]Move bincount OP to phi (#39947) · 1c29196e
      0x45f 提交于
      * move bincount OP to phi
      
      * fix dtype
      
      * set_dtype by weights or x
      
      * fix conflicts
      1c29196e
    • M
      cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca
      Ming-Xu Huang 提交于
      * Added cuBlasLtHandle_t to device context.
      
      * Added fused_gemm_epilogue op.
      
      1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
      2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
      2. Act currently only be supported ReLU. (Will add GeLU in the future).
      
      * Added UT to fused_gemm_epilogue op.
      
      * Added LinearAct Pattern
      
      1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
      pattern.
      2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
      3. act currently only support ReLU (Will support GeLU in the future).
      
      * Added FuseGemmEpiloguePass
      
      1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
      fusion (GeLU will be supported in the future).
      2. Only support matmul_v2 from nn.Linear.
      
      * Added pybind to BuildStrageter.fuse_gemm_epilogue_.
      
      * Added UT for fuse_gemm_epilogue_pass.
      
      * GeLU support and EpilogueSingleton
      
      1. Added GeLU support to fused_gemm_epilogue op.
      2. Added EpilogueSingleton to cache auxiliary pointer.
      3. Added related UTs.
      
      * Rename cublaslt_epilogue_opto gemm_epilogue_op.*.
      
      * Added both train and infer pattern to LinearAct.
      
      1. Added support of fwd graph with grap_ops linking to LinearAct.
      2. Added related changes to fuse_gemm_epilogue_pass for above
      modification.
      
      * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.
      
      * Added identity activation support to gemm_epilogue_op.
      
      * Added Linear Fusion (matmul_v2 + ele_add)
      
      1. Added matmul_v2 + ele_add pattern to LinearActPattern.
      2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.
      
      * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*
      
      * Add fused_gemm_epilogue_grad op.
      
      1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.
      
      * Add UTs to fused_gemm_epilogue_grad_op.
      
      * Change attribute name in fused_gemm_epilogue_grad_op for clearing.
      
      * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.
      
      * Added ElementwiseAdd+Matmul+Act graph pattern detection.
      
      * Fuse backward of Linear( Act(x))
      
      1. Added backward fusion pass to Linear( Act(x)).
      2. Added backward fusion pass to Linear(x).
      
      * Added UTs to backward fusion of Linear(Act(x)).
      
      * Complete document of arguments to fused_gemm_epilogue_op.
      
      * Made arguments of some functions pass by reference.
      
      * Modify code with review comments.
      
      1. Made arguments of some function pass by reference.
      2. Removed redundant code.
      3. Followed Google code style to change code.
      
      * Made 'const' code style be consistent
      
      * Fixed random seed of python UTs.
      
      * Set Compiling constrains to cuBlasLt
      
      1. Require CUDA 11.6+
      2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.
      
      * Code Reivew from Paddle
      
      1. Changed arguments name is_first_gemm to without_x_gradient for
      clearing.
      2. Applied PADDLE_THROW in fused_gemm_epilogue_op.
      
      * Remove EpilogueSingleton
      
      1. Applied ReserveSpace to replace Epilogue for passing auxiliary
      pointers between FWD and BWD.
      
      * Fix a logical error and enhance UTs.
      
      1. Added act op count checking in UTs.
      2. Fix issue to fuse backward or ReLU(Linear(X)).
      3. TODO: solve GELU fusion issues.
      
      * Fix Linear and GeLU fusion issues.
      
      1. Modified graph_detech_pattern to fit with both linear wiht gelu or
      relu.
      2. Modified data range in Uts to allow negative values.
      
      * Removed fused_gemm_epilogue_op.h.
      
      * Rename namespace pten to phi.
      
      * Rename name of arguments in fused_gemm_epilogue_op
      
      1. bias -> Bias.
      2. out -> Out.
      3. reserve_space -> ReserveSpace.
      
      * Change EpiloguePassActivationCache as local variable.
      
      1. Removed singleton in EpiloguePassActivationCache.
      2. Made EpiloguePassActivationCache as an argument to each pass
      functions.
      2a3d9eca
    • W
      [phi] move is_empty to phi (#39919) · 72964335
      WJJ1995 提交于
      * Add is_empty
      
      * fixed for CI
      
      * fixed code style
      
      * resolve conflict
      
      * deal with comments
      
      * replace pt by pd
      72964335
    • W
      Add mlir trt engine type. (#40197) · 6fd96a04
      Wilber 提交于
      * infrt add trt engine
      
      * update engine name
      6fd96a04
    • Y
      [Phi]Move elementwise_div grad/double grad Kernel to Phi (#40172) · c52a664e
      YuanRisheng 提交于
      * move elementwise_div grad
      
      * change mutable_data to alloc
      
      * fix compile bugs
      c52a664e
    • W
      fix infer shapes of pool_with_index (#40139) · 0fb6bca4
      Wei Shengyu 提交于
      * dbg pool infer shapes
      
      * dbg
      
      * fix format
      0fb6bca4
    • A
      [Phi] Fix macro name typo (#40204) · 55a3bfbd
      Aurelius84 提交于
      55a3bfbd
    • J
      fix_conv2d_trt_convert_test_case (#39882) · d255bfe0
      JingZhuangzhuang 提交于
      * fix_conv2d_trt_convert_test_case
      
      * fix_conv2d_trt_convert_test_case
      
      * fix_conv2d_trt_convert_test_case
      
      * fix_conv2d_trt_convert_test_case
      d255bfe0
    • C
      [Phi] Remove storage deps of empty (#40136) · b46e49de
      Chen Weihang 提交于
      * remove storage deps of empty
      
      * remove invalid empty method
      
      * remove error empty using
      
      * fix test_sparse_utils_dev_api
      
      * revert some sparse change
      
      * add memset for conv grad
      
      * resolve conflict
      
      * resolve conflict
      
      * resolve conflict
      b46e49de
    • Z
      [bf16] add bf16 kernel: gaussian_random fill_constant fill_any_like (#40027) · 6a0d60d2
      zhangbo9674 提交于
      * add gaussian random
      
      * add full
      
      * refine reduce
      
      * refine code
      
      * refine gaussian_random unittest
      
      * add unittest for fill_any_like fill_constant
      6a0d60d2
    • L
      [phi] move multi_dot OP (#40038) · fd36ede6
      Liu-xiandong 提交于
      * [phi] move multi_dot OP
      
      * fix the segment bug
      
      * fix bug
      
      * delete useless comment
      
      * fix CI bug
      fd36ede6
    • Z
      [AutoParallel]engine support pp (#40084) · 71cb016c
      zhaoyingli 提交于
      * engine support pp
      
      * fix format
      
      * avoid multi print
      
      * fix convert
      
      * bug fix
      
      * add pp unittest
      71cb016c
    • Z
      [bf16] add bf16 kernel: sigmoid & sqrt & softplus & square (#40004) · 98c427e2
      zhangbo9674 提交于
      * add activ
      
      * refine unittest
      
      * refine unittest
      
      * refine unittest
      
      * refine unittest
      
      * refine code
      98c427e2
    • Z
      [MLU]support reduce tensors on mlu (#40000) · b4eb413e
      zn 提交于
      * [MLU]support reduce tensors on mlu
      
      * [MLU]fix compiler options
      b4eb413e
    • L
      initialize processgroupnccl with store (#40181) · 0ad25fb9
      lilong12 提交于
      0ad25fb9
    • A
      [Phi]Migrate Adamax and Adadelta Optimizer Op into Phi (#40173) · f5ec0314
      Aurelius84 提交于
      * [Phi]Migrate Adamax into phi
      
      * Add adadelta kernel
      f5ec0314
    • Z
      [AMP] refine paddle.amp.decorate code example (#40159) · da3de72d
      zhangbo9674 提交于
      * refine amp.decorate code example
      
      * refine code
      da3de72d
  2. 06 3月, 2022 3 次提交
  3. 05 3月, 2022 5 次提交
  4. 04 3月, 2022 15 次提交