1. 30 1月, 2023 1 次提交
  2. 18 1月, 2023 1 次提交
  3. 16 1月, 2023 1 次提交
    • Z
      CUDA12.0 integration (#49539) · 1885d55a
      zlsh80826 提交于
      * Update warpctc for cuda-12
      
      * Deprecate cudaProfilerInitialize for CUDA > 11
      
      * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040
      
      * Add the missing thrust header
      1885d55a
  4. 13 1月, 2023 3 次提交
  5. 11 1月, 2023 1 次提交
    • Y
      Implement a common segmented array. (#49450) · b1faa562
      Yiqun Liu 提交于
      * Implement a common PointerArray.
      
      * Polish codes.
      
      * Add including of header file.
      
      * Add the branch of kFix8.
      
      * Fix compiling error.
      
      * Add alignas hint to fix the performance drop.
      
      * Optimize the H2D copy in stack_grad.
      
      * Rename the macro.
      
      * Fix align hint for different compilers.
      
      * Polish the define of PADDLE_ALIGN.
      
      * Fix compiling error.
      
      * Remove the align hint on windows.
      b1faa562
  6. 10 1月, 2023 2 次提交
  7. 09 1月, 2023 2 次提交
  8. 04 1月, 2023 1 次提交
  9. 03 1月, 2023 2 次提交
  10. 26 12月, 2022 1 次提交
  11. 20 12月, 2022 1 次提交
  12. 19 12月, 2022 2 次提交
  13. 16 12月, 2022 1 次提交
  14. 15 12月, 2022 1 次提交
  15. 14 12月, 2022 1 次提交
  16. 12 12月, 2022 2 次提交
  17. 08 12月, 2022 5 次提交
  18. 07 12月, 2022 1 次提交
  19. 05 12月, 2022 5 次提交
  20. 03 12月, 2022 1 次提交
  21. 02 12月, 2022 2 次提交
    • B
      Split common funcs from reduction and structure modification (#46970) · ef575d6a
      Bo Zhang 提交于
      * profile reduce kernel for fp16 and reduceHigherdim
      
      * use reinterpret_cast
      
      * fix for CI on ROCm
      
      * add Macro for ROCm
      
      * ROCm CI config
      
      * ROCm CI config
      
      * unit test repair
      
      * pull
      
      * add common_funcs.h
      
      * reduceType
      
      * Update reduce_function.h
      
      * not higher
      
      * rename
      ef575d6a
    • J
      [Eager] Optimize Grad by prune useless branch (#47827) · d1e93be1
      Jiabin Yang 提交于
      * [Eager] Fix paddle.grad interface
      
      * [Eager] Support minimum SubGraph for GeneralGrad
      
      * Add needed_nodes to prune grad graph more thoroughly
      
      * [Eager] Add grad_node_trans_mapping_ to record which grad_node has been transformed to AccumulationNode
      
      * [Eager] Fix paddle.grad interface
      
      * Polish code
      
      * remove potential_stop_node
      
      * Add endding_nodes to enhance genSugraph logic
      
      * clear endding_nodes_
      
      * polish code
      
      * rename endding_nodes to endding_nades_
      
      * Refactor grad interface
      
      * Add register_hook case to fix coverage-ci
      
      * Fix code format
      
      * Refactor general_grad
      
      * Add more code comments
      
      * call clear directly to release GradSlotMeta
      
      * fix a mistake
      
      * fix matmul/ multiply kernel logic and optional input in yaml, fill zeros logic and so on.
      
      * fix batch_norm_double_grad yaml optional config
      
      * fix tanh_triple_grad yaml and kernels
      
      * fix MultiplyTripleGradKernel optional logic
      
      * fix merge mistake
      
      * fix compile error
      
      * remove legacy attr for bn
      
      * polish code
      
      * fix some kernel
      
      * merge develop
      
      * fix error
      
      * remote log
      
      * fix kernel with full like
      
      * hide value log behind
      
      * hide value log behind
      
      * fix matmul_triple grad
      Co-authored-by: NWeilong Wu <veyron_wu@163.com>
      d1e93be1
  22. 30 11月, 2022 1 次提交
  23. 29 11月, 2022 1 次提交
  24. 28 11月, 2022 1 次提交
    • H
      [PHI decoupling] move several header files from fluid to phi (#48415) · fd9c91c3
      huangjiyi 提交于
      * decouple cudnn_desc.h from fluid
      
      * move cudnn_desc.h from fluid to phi
      
      * fix bugs
      
      * decouple cudnn_helper.h from fluid
      
      * fix bugs
      
      * move cudnn_helper.h from fluid to phi
      
      * add fluid cudnn_helper.h
      
      * move miopen_desc.h from fluid to phi
      
      * move miopen_helper.h from fluid to phi
      
      * fix bugs
      
      * move gpu_dnn.h from fluid to phi
      
      * fix bugs
      
      * update copyright year
      
      * simplify gpu_dnn.h in fluid
      
      * fix bugs
      
      * fix xpu build bug
      
      * fix compile bug
      
      * fix bug
      fd9c91c3