1. 05 8月, 2022 1 次提交
  2. 04 8月, 2022 3 次提交
  3. 02 8月, 2022 2 次提交
    • H
      [cherry-pick]Ort backend optimizer(#44136 #44703 #44724) (#44766) · 35297bd8
      heliqi 提交于
      * [Inference]ort backend optimizer (#44136)
      
      * add ort clone interface
      
      * paddle2onnx update to 1.0.0rc
      
      * ort input_tensor use mutable data of scope
      
      * clone ort_predictor reuse session (#44703)
      
      * ort backend support output mutable data (#44724)
      
      * 2.3 interface is different from the Develop interface
      
      * 2.3 interface is different from the Develop interface
      
      * 2.3 interface is different from the Develop interface
      35297bd8
    • C
      Fix operator type record in profiler [cherry-pick PR44582] (#44654) · 6de20581
      chenjian 提交于
      * fix record event for operator type in new dygraph (#44582)
      
      * fix new dygraph record event for op
      
      * update unit test
      
      * fix file mode
      6de20581
  4. 25 7月, 2022 1 次提交
  5. 19 7月, 2022 1 次提交
    • C
      Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190
      chenjian 提交于
      * add serialization for new field in event node (#43405)
      
      * add serialization for new field in event node
      
      * fix a bug
      
      * add more field to memory record (#43578)
      
      * Add infer shape in dygraph (#43822)
      
      * record memory and op supplement info
      
      * update
      
      * update
      
      * fix a bug
      
      * fix memory recording
      
      * fix a bug
      
      * update
      
      * update
      
      * fix a bug
      
      * update
      
      * fix a bug
      
      * fix a bug
      
      * fix a bug
      
      * update dygraph record
      
      * add infer shape record
      
      * fix
      
      * fix
      
      * fix
      
      * add comments
      
      * fix a bug
      
      * fix
      
      * fix
      
      * add record op info
      
      * fix file mode
      
      * add op input shape info
      
      * fix dependency
      a2240190
  6. 12 7月, 2022 1 次提交
  7. 30 6月, 2022 2 次提交
  8. 29 6月, 2022 1 次提交
  9. 28 6月, 2022 1 次提交
  10. 27 6月, 2022 2 次提交
  11. 25 6月, 2022 1 次提交
  12. 24 6月, 2022 1 次提交
    • A
      [cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa
      Aganlengzi 提交于
      * Use all sitepackages path as the library/include path (#42940)
      
      * Fix several unit tests and increase the unit tests stability (#43670)
      
      * Reduce gather op unit tests size and increase the timeout
      
      * Add NVIDIA_TF32_OVERRIDE for multi-processes environment
      
      * Remove record test for device event ut
      
      * Fix 3 unittest errors (#43532)
      
      * Fix test_fuse_resnet_unit failure
      
      * Fix test_imperative_auto_mixed_precision failure
      
      * Fix sparse_attention_op error
      
      * Fix sparse_attention_op error
      
      * Use fixed random seed (#43659)
      
      * for CI test_collective_sendrecv_api
      Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
      Co-authored-by: NShijie <505749828@qq.com>
      9edbe4aa
  13. 23 6月, 2022 4 次提交
  14. 22 6月, 2022 4 次提交
    • C
      Cherry pick 43307 (#43618) · d0bbf46c
      ccrrong 提交于
      * add bilinear_interp_v2 converter
      
      * update op_teller.cc
      
      * add unittest for bilinear_interp_v2 converter
      
      * code format
      
      * bug fix
      
      * code format and add unittest
      
      * remove merged modify in op_teller.cc
      
      * code format
      
      * code format
      
      * fix scale init error
      d0bbf46c
    • Y
      Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df
      Yiqun Liu 提交于
      cherry-pick #42750。
      
      QA反馈,#42750 优化后,solov2模型性能可提升6%,故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下,该pr不在release/2.3分支中,故将#42750 中python修改同步到fluid.layers.tensor.linspace中。
      4dcfc6df
    • Z
      [cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2
      Zhang Ting 提交于
      [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)
      
       [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax
      
      cherry-pick #43635 #43681 #43474
      0660d5f2
    • S
      test=document_fix;cherry pick code format check upgrade to release/2.3 (#43732) · 8e6a1945
      Sing_chan 提交于
          Only cherry pick format tool(clang-format, yapf, cmake-format) upgrade to release/2.3, lint tool such as cpplint will not move, because we are not going to fix cpplint error in release/2.3
          pre_commit.sh also is moved to release/2.3 so that both PR-CI-pre-commit and PR-CI-pre-commit-23 can works.
          pre install clang-format to avoid repeat installation due to pre-commit's multi-thread running.
      8e6a1945
  15. 21 6月, 2022 4 次提交
  16. 20 6月, 2022 1 次提交
  17. 17 6月, 2022 2 次提交
  18. 15 6月, 2022 1 次提交
  19. 14 6月, 2022 1 次提交
    • X
      [ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92
      xiongkun 提交于
      * [EinsumOp] Polish forward logic and backward logic for optimize (#42603)
      
      * change logic for optimize
      
      * modifty
      
      * merge
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)
      
      * [EinsumOp] Make EinsumOp support bfloat16. (#43085)
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0
      
      * make EInsumOP support bf16
      
      * add unittest for BF16
      
      * add condition for test_BF16
      
      * fix bugs
      
      * fix
      
      * change the backward api to fit einsum op
      22e75d92
  20. 09 6月, 2022 1 次提交
  21. 08 6月, 2022 2 次提交
  22. 06 6月, 2022 1 次提交
    • N
      cherry-pick 42645 (#43205) · 835a1888
      niuliling123 提交于
      删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。
      从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR.
      Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
      835a1888
  23. 30 5月, 2022 2 次提交