1. 31 8月, 2023 1 次提交
    • T
      Add fused_scale_bias_relu_conv_bnstats OP (#55026) · 71e28b12
      Tian Zheng 提交于
      * Add fused_scale_bias_relu_conv_bnstats op
      
      * Review changes
      
      * Fix no CUDNN Frontend build
      
      * Fix PADDLE_ENFORCE format
      
      * Fix PADDLE_ENFORCE CI error
      
      * Rename kernel filename
      
      * Refactor unittest to use paddle eager_op_test
      
      * Fix padding bugs
      
      * Review changes
      
      * test=cuda117
      
      * test=cuda117
      71e28b12
  2. 19 4月, 2023 1 次提交
  3. 21 2月, 2023 1 次提交
  4. 18 11月, 2022 1 次提交
    • T
      CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b
      Tian Zheng 提交于
      * Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation
      
      * Fix macro
      
      * Add implementation for conv_kernel and conv_grad_kernel
      
      * Modification after rebase onto latest develop
      
      * Modify plan cache to comply with the API of phi::autotune
      
      * Refactor to reduce duplicate code
      
      * Review fix:
      - move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
      - add const specifier for input tensor
      - add logging when plans fail to execute
      - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
      
      * - move plan building outside of cache
      
      * Fix ROCM build
      14a6e67b
  5. 11 11月, 2022 1 次提交
  6. 25 8月, 2022 1 次提交
    • H
      optimize conv algo cache (#41891) · 1cd7e68b
      hong 提交于
      * optimizer conv alog speed
      
      * code polish
      
      * remove useless code
      
      * fix compile error
      
      * fix cpu compile error
      
      * not use cudnn alog t
      
      * add search cache max number
      
      * polish code
      
      * fix cache test bug
      
      * add groups data format to conv args
      
      * fix cache test bug
      
      * fix cudnn_deterministic bug
      
      * fix test switch auto tune bug
      
      * fix test swith autotune bug;
      
      * fix conv cache bug
      
      * fix cache test error
      
      * fix cache test bug
      
      * fix windows mac compile error
      
      * fix workspace search error
      
      * update cudnn cache
      
      * fix cache test bug; test=develop
      
      * fix autotune swith test error
      
      * polish code
      
      * oplish code
      1cd7e68b
  7. 01 7月, 2022 1 次提交
    • L
      Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3
      limingshu 提交于
      * 2nd part of transpose update
      
      * add switch_auto_tune option.
      
      * add some changes according to Ci
      
      * refine the structure of auto_tune_base.
      
      * merge develop changes
      
      * reset the switch_set_range and change unittest of transpose auto-tune
      
      * change the kernel auto-tune logits
      53d5abe3
  8. 05 6月, 2022 1 次提交
  9. 15 4月, 2022 1 次提交
    • L
      Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda
      limingshu 提交于
      * change cudnn helper for auto-tune
      
      * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
      
      * Fix the bug in calculating and printing current step cache hit rate.
      
      * Improve the autotune cache and fix unittest.
      
      * Change the key from AlgorithmType to int64_t.
      
      * Fix unittest for cpu-only env.
      
      * change ChooseAlgoByWorkspace for heuristic mode
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      35acfeda
  10. 05 4月, 2022 1 次提交
    • Z
      Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e
      Zhang Ting 提交于
      * switch autotune
      
      * implement AutoTuneCache
      
      * implement AutoTuneCache class
      
      * add pybind api
      
      * add dygraph test
      
      * support static mode and eager mode and improve unittests
      
      * rename the SwitchAutoTune Class and improve tests
      
      * improve AutoTuneStatus and reduce the cost of tests
      b0f8000e
  11. 03 3月, 2022 1 次提交
  12. 25 2月, 2022 1 次提交
    • 0
      move eye、size、erfinv、pixel_shuffle OP to phi (#39712) · 639675de
      0x45f 提交于
      * move eye OP to pten
      
      * move size OP to pten
      
      * merge develop
      
      * fix merge
      
      * move files
      
      * move erfinv OP to phi
      
      * remove comment
      
      * move pixel_shuffle OP to phi
      
      * remove comment
      
      * fix PT_REGISTER
      
      * fix NPU
      
      * fix CR
      
      * remove size_sig.cc for PR-CI-Coverage
      639675de
  13. 20 2月, 2022 1 次提交
  14. 15 2月, 2022 1 次提交
    • F
      Move Abs OP to pten (#39492) · fb473067
      From00 提交于
      * Move Abs op to pten
      
      * Fix NPU compilation error
      
      * Fix CI error
      
      * Use LaunchSameDimsElementwiseCudaKernel in pten
      fb473067
  15. 28 1月, 2022 1 次提交
    • H
      Move digamma to pten (#39240) · 848ae7dc
      hong 提交于
      * move digamma to pten; test=develop
      
      * fix mutable_data bugs; test=develop
      
      * remove useless code; test=develop
      
      * remove kernel compute; test=develop
      
      * fix bug; test=develop
      848ae7dc
  16. 17 1月, 2022 1 次提交
  17. 12 1月, 2022 1 次提交
    • Z
      the_one_ps dirs reconstruct (#38804) · 50609214
      ziyoujiyi 提交于
      * delete gloo connect retry
      
      * the_one_ps dirs reconstruct
      
      * .
      
      * .
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * create the_one_ps dirs
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      
      * the one ps dirs modify
      50609214
  18. 03 11月, 2021 1 次提交
  19. 18 9月, 2021 1 次提交
    • H
      Basic PR on Cost Model (#35774) · 5ba9fe6e
      Huihuang Zheng 提交于
      Add basic Cost Model, it uses executor to run program and profile it to get op time.
      
      This is an early basic version, we will add more functions in the future.
      5ba9fe6e
  20. 15 9月, 2020 1 次提交
  21. 03 6月, 2020 1 次提交
    • Y
      Add crypto python (#24836) · aa47356b
      Yanghello 提交于
      * add crypto helper for paddle, test=develop
      
      * cryptopp.cmake bug fixed, test=develop
      
      * remove debug build type, test=develop
      
      * fixed CMakeLists for new target, test=develop
      
      * fix CI bug, test=develop
      
      * add cmake option flag DWITH_CRYPTO, test=develop
      
      * add crypto api for python, test=develop
      
      * Revert "add crypto api for python, test=develop"
      
      This reverts commit 3a1cfa9d.
      
      * Revert "Add crypto api (#24694)"
      
      This reverts commit 5a7a517c.
      
      * Revert "Revert "Add crypto api (#24694)""
      
      This reverts commit f952b19f.
      
      * fixed cryptopp cmake building error, test=develop
      
      * change WITH_CRYPTO building option to OFF, test=develop
      
      * â€fixed cipher test failed, test=develop
      
      * "add crypto api for python, test=develop"
      
      This reverts commit 83fb55c0.
      
      * travis CI bug fixed, test=develop
      
      * fixed test in python3
      
      * test=develop
      
      * fixed unittest, test=develop
      aa47356b
  22. 21 1月, 2019 1 次提交
  23. 10 1月, 2019 1 次提交
  24. 13 12月, 2018 1 次提交
    • S
      fix cmake · deb0d41c
      sneaxiy 提交于
      fix cmake again
      test=develop
      deb0d41c
  25. 10 12月, 2018 1 次提交
  26. 10 9月, 2018 1 次提交
  27. 18 6月, 2018 1 次提交
  28. 24 5月, 2018 1 次提交
  29. 23 5月, 2018 1 次提交
  30. 22 3月, 2018 1 次提交
  31. 07 3月, 2018 2 次提交
  32. 06 3月, 2018 2 次提交
  33. 15 2月, 2018 1 次提交
    • Y
      Update tensor_util.h (#8422) · cfffb1a3
      Yi Wang 提交于
      * Update tensor_util.h
      
      * Update with moved TensorDesc
      
      * Fix tensur_utils.cu
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Make tensor_util.cu a symbolic link
      cfffb1a3
  34. 10 2月, 2018 2 次提交
  35. 07 2月, 2018 1 次提交
  36. 06 2月, 2018 2 次提交