1. 23 6月, 2022 1 次提交
  2. 22 6月, 2022 2 次提交
    • Y
      Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df
      Yiqun Liu 提交于
      cherry-pick #42750。
      
      QA反馈,#42750 优化后,solov2模型性能可提升6%,故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下,该pr不在release/2.3分支中,故将#42750 中python修改同步到fluid.layers.tensor.linspace中。
      4dcfc6df
    • Z
      [cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2
      Zhang Ting 提交于
      [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)
      
       [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax
      
      cherry-pick #43635 #43681 #43474
      0660d5f2
  3. 20 6月, 2022 1 次提交
  4. 15 6月, 2022 1 次提交
  5. 14 6月, 2022 1 次提交
    • X
      [ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92
      xiongkun 提交于
      * [EinsumOp] Polish forward logic and backward logic for optimize (#42603)
      
      * change logic for optimize
      
      * modifty
      
      * merge
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)
      
      * [EinsumOp] Make EinsumOp support bfloat16. (#43085)
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0
      
      * make EInsumOP support bf16
      
      * add unittest for BF16
      
      * add condition for test_BF16
      
      * fix bugs
      
      * fix
      
      * change the backward api to fit einsum op
      22e75d92
  6. 08 6月, 2022 1 次提交
    • N
      Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e
      niuliling123 提交于
      Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现,文件编译时间较长,因此本PR将其替换为KP实现
      删除DefaultElementwiseOperator中重复功能支持,减少elementwise_double_grad OP编译时间
      e161979e
  7. 07 6月, 2022 1 次提交
  8. 06 6月, 2022 1 次提交
    • N
      cherry-pick 42645 (#43205) · 835a1888
      niuliling123 提交于
      删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。
      从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR.
      Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
      835a1888
  9. 06 5月, 2022 1 次提交
  10. 04 5月, 2022 1 次提交
  11. 30 4月, 2022 1 次提交
  12. 28 4月, 2022 5 次提交
  13. 26 4月, 2022 1 次提交
    • C
      [Cherry-pick] Optimize dygraph performance part2 (#42224) · ab24b9c0
      Chen Weihang 提交于
      * Add paddle::variant and replace paddle::any (#42139)
      
      * add variant and replace any
      
      * split attribute
      
      * Optimize dygraph GetExpectedKernelType perf (#42154)
      
      * opt dygraph scheduling
      
      * revert part impl
      
      * fix variant compile error (#42203)
      
      * replace any by variant in infermeta (#42181)
      ab24b9c0
  14. 25 4月, 2022 1 次提交
    • A
      [Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm... · 58d0d15e
      Aurelius84 提交于
      [Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm and fix shape op (#42170)
      
      * [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138)
      
      * [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT
      
      * [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT
      
      * [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132)
      58d0d15e
  15. 21 4月, 2022 2 次提交
  16. 19 4月, 2022 4 次提交
  17. 18 4月, 2022 2 次提交
  18. 15 4月, 2022 3 次提交
  19. 14 4月, 2022 2 次提交
    • C
      Cherry pick final state ops (#41755) · 921a6fb7
      chentianyu03 提交于
      * [Yaml]add exp yaml (#41217)
      
      * add exp yaml
      
      * add exp api in test case
      
      * add determinant yaml
      
      * fix exp op unittest
      
      * change test class name
      
      * modify api name
      
      * compacted with raw api
      
      * fix det api
      
      * add python_api
      
      * add test eager for determinant op
      
      * [Yaml] Add assign yaml (#41428)
      
      * add assign yaml
      
      * add assign api
      
      * add assign backward api
      
      * add assign
      
      * add assign yaml
      
      * add assign
      
      * assign yaml
      
      * add assign raw kernel and use assign_raw in yaml
      
      * merge develop branch
      
      * add missing python_api
      
      * exchange assign and assign_raw kernel name (#41625)
      
      * exchange assign and assign_raw kernel name
      
      * fix register error
      
      * [Yaml]add gaussian_random yaml and test case (#41312)
      
      * add guassian random yaml
      
      * add gaussian_random yaml and test case
      
      * fix error modify of full yaml
      
      * import in_dygraph_mode
      
      * import _in_legacy_dygraph
      
      * add place arg in api
      
      * import __current_expected_place
      
      * fix test_egr_python_api failed case
      
      * add test case
      
      * add cast for NormalInitializer
      
      * fix test error
      
      * fix test error
      
      * rm unsed check code
      
      * fix test error in test_initializer_nn
      
      * modify by review
      
      * [Phi]fix split error when sections has 0 size and add test case (#41708)
      
      * fix split error when sections has 0 size and add test case
      
      * fix test case
      921a6fb7
    • W
      add fp16 kernel to clip_grad (#41675) · d447c678
      wuyefeilin 提交于
      d447c678
  20. 13 4月, 2022 3 次提交
  21. 12 4月, 2022 3 次提交
  22. 11 4月, 2022 2 次提交
    • H
      add depthwise conv hip support (#41537) (#41603) · 676c960c
      hong 提交于
      676c960c
    • C
      [Cherry-pick] Add truncated_normal/unique/swish/unbind yaml and polish Getting... · b2e095c4
      Chen Weihang 提交于
      [Cherry-pick] Add truncated_normal/unique/swish/unbind yaml and polish Getting tensor place impl (#41539)
      
      * [Phi] Polish truncated normal kernel and add yaml (#41280)
      
      * polish truncated normal kernel
      
      * add yaml
      
      * add truncated normal kernel and add yaml
      
      * polish unittests and yaml
      
      * import dygraph mehtod
      
      * add unique yaml and final state api (#41460)
      
      * fix get tensor backend set bug (#41478)
      
      * [Phi] Add unbind yaml and final state api (#41277)
      
      * add unbind yaml
      
      * fix unittest
      
      * [Phi] Add swish yaml and final state api (#41479)
      
      * add swish yaml and final state api
      
      * skip mkldnn test
      
      * fix grad mkldnn test
      
      * add cherry-pick lost code
      b2e095c4