1. 13 4月, 2023 1 次提交
  2. 10 4月, 2023 1 次提交
  3. 20 3月, 2023 1 次提交
    • L
      Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear forward op (#51124) · 2dfc3fa8
      limingshu 提交于
      * optimization for fused linear op
      
      * fix code format
      
      * optimization for linear fused forward
      
      * merge with develop
      
      * fix bugs for gemm_ephilog
      
      * package of cublaslt ephilogue type with enmu
      
      * final fix before code reviewing
      
      * fix missed fusedType typo
      
      * fix code according to review suggestions
      
      * fix windows ci error
      
      * change location of MatmulPlanner
      
      * add some changes for compiler error fix
      
      ---------
      2dfc3fa8
  4. 15 3月, 2023 1 次提交
  5. 02 3月, 2023 1 次提交
    • L
      Cache for cublaslt descriptor (#50931) · 819f8939
      limingshu 提交于
      * first commit
      
      * finish base work
      
      * modification for good
      
      * fix for cache setting and gather the algo and desc as one data for cache storage
      
      * fix for cache setting and gather the algo and desc as one data for cache storage
      
      * install pre-commit check
      819f8939
  6. 26 2月, 2023 1 次提交
  7. 21 2月, 2023 1 次提交
  8. 25 1月, 2023 1 次提交
  9. 14 12月, 2022 1 次提交
  10. 24 11月, 2022 1 次提交
  11. 18 11月, 2022 1 次提交
    • T
      CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b
      Tian Zheng 提交于
      * Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation
      
      * Fix macro
      
      * Add implementation for conv_kernel and conv_grad_kernel
      
      * Modification after rebase onto latest develop
      
      * Modify plan cache to comply with the API of phi::autotune
      
      * Refactor to reduce duplicate code
      
      * Review fix:
      - move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
      - add const specifier for input tensor
      - add logging when plans fail to execute
      - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
      
      * - move plan building outside of cache
      
      * Fix ROCM build
      14a6e67b
  12. 11 11月, 2022 1 次提交
  13. 10 11月, 2022 1 次提交
  14. 08 11月, 2022 1 次提交
  15. 01 11月, 2022 1 次提交
    • L
      Fix bugs in tranpose kernel (#47212) · ec7fe888
      limingshu 提交于
      * first commit
      
      * transpose_kernel_optimization
      
      * first complishment of transpose op
      
      * second commit
      
      * refine code logics of tranpose_kernel
      
      * refine transpose kernel
      
      * first commit
      
      * fix DtoD copy bugs for hip
      
      * refine code according to the PR advice
      
      * change dim to int64_t type.
      
      * fix some type error
      ec7fe888
  16. 19 10月, 2022 1 次提交
  17. 28 9月, 2022 1 次提交
  18. 22 9月, 2022 1 次提交
  19. 14 9月, 2022 1 次提交
  20. 25 8月, 2022 1 次提交
    • H
      optimize conv algo cache (#41891) · 1cd7e68b
      hong 提交于
      * optimizer conv alog speed
      
      * code polish
      
      * remove useless code
      
      * fix compile error
      
      * fix cpu compile error
      
      * not use cudnn alog t
      
      * add search cache max number
      
      * polish code
      
      * fix cache test bug
      
      * add groups data format to conv args
      
      * fix cache test bug
      
      * fix cudnn_deterministic bug
      
      * fix test switch auto tune bug
      
      * fix test swith autotune bug;
      
      * fix conv cache bug
      
      * fix cache test error
      
      * fix cache test bug
      
      * fix windows mac compile error
      
      * fix workspace search error
      
      * update cudnn cache
      
      * fix cache test bug; test=develop
      
      * fix autotune swith test error
      
      * polish code
      
      * oplish code
      1cd7e68b
  21. 15 7月, 2022 1 次提交
  22. 01 7月, 2022 1 次提交
    • L
      Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3
      limingshu 提交于
      * 2nd part of transpose update
      
      * add switch_auto_tune option.
      
      * add some changes according to Ci
      
      * refine the structure of auto_tune_base.
      
      * merge develop changes
      
      * reset the switch_set_range and change unittest of transpose auto-tune
      
      * change the kernel auto-tune logits
      53d5abe3
  23. 24 6月, 2022 1 次提交
    • Y
      [Phi]Change Copy from Kernel to basic component utils (#43622) · 2739bd73
      YuanRisheng 提交于
      * perfect copy
      
      * deal with conflict
      
      * deal with conflict
      
      * fix compile bugs
      
      * fix unittest bugs
      
      * change code format
      
      * deal with conflict
      
      * modify code by review
      
      * fix ce bugs
      
      * fix ce bugs
      
      * add lo
      
      * perfect code format
      
      * deal with conflicts
      2739bd73
  24. 07 6月, 2022 1 次提交
  25. 05 6月, 2022 1 次提交
  26. 04 6月, 2022 1 次提交
  27. 15 4月, 2022 1 次提交
    • L
      Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda
      limingshu 提交于
      * change cudnn helper for auto-tune
      
      * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
      
      * Fix the bug in calculating and printing current step cache hit rate.
      
      * Improve the autotune cache and fix unittest.
      
      * Change the key from AlgorithmType to int64_t.
      
      * Fix unittest for cpu-only env.
      
      * change ChooseAlgoByWorkspace for heuristic mode
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      35acfeda
  28. 09 4月, 2022 1 次提交
  29. 06 4月, 2022 1 次提交
  30. 05 4月, 2022 1 次提交
    • Z
      Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e
      Zhang Ting 提交于
      * switch autotune
      
      * implement AutoTuneCache
      
      * implement AutoTuneCache class
      
      * add pybind api
      
      * add dygraph test
      
      * support static mode and eager mode and improve unittests
      
      * rename the SwitchAutoTune Class and improve tests
      
      * improve AutoTuneStatus and reduce the cost of tests
      b0f8000e
  31. 31 3月, 2022 2 次提交
    • Z
      7dfd3846
    • L
      add_autotune_kernel_tool (#40658) · 7c5dca9f
      limingshu 提交于
      * for 1st time interface combine.
      
      * modification with kernel factory
      
      * first auto_tune version.
      
      * first version.
      
      * basic version
      
      * add warm up step.
      
      * a debug version.
      
      * optimize the functionality of class auto_tuner.
      
      * add some quotes for optimized auto_tuner class.
      
      * add some quotes for optimized auto_tuner class.
      
      * add namespace.
      
      * modification according to the advices
      
      * replace fluid header with phi header.
      
      * replace fluid header with phi header.
      7c5dca9f
  32. 25 3月, 2022 1 次提交
  33. 23 3月, 2022 1 次提交