1. 25 7月, 2022 1 次提交
  2. 19 7月, 2022 1 次提交
    • C
      Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190
      chenjian 提交于
      * add serialization for new field in event node (#43405)
      
      * add serialization for new field in event node
      
      * fix a bug
      
      * add more field to memory record (#43578)
      
      * Add infer shape in dygraph (#43822)
      
      * record memory and op supplement info
      
      * update
      
      * update
      
      * fix a bug
      
      * fix memory recording
      
      * fix a bug
      
      * update
      
      * update
      
      * fix a bug
      
      * update
      
      * fix a bug
      
      * fix a bug
      
      * fix a bug
      
      * update dygraph record
      
      * add infer shape record
      
      * fix
      
      * fix
      
      * fix
      
      * add comments
      
      * fix a bug
      
      * fix
      
      * fix
      
      * add record op info
      
      * fix file mode
      
      * add op input shape info
      
      * fix dependency
      a2240190
  3. 12 7月, 2022 1 次提交
  4. 30 6月, 2022 2 次提交
  5. 29 6月, 2022 1 次提交
  6. 28 6月, 2022 2 次提交
  7. 27 6月, 2022 2 次提交
  8. 25 6月, 2022 1 次提交
  9. 24 6月, 2022 2 次提交
    • A
      [cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa
      Aganlengzi 提交于
      * Use all sitepackages path as the library/include path (#42940)
      
      * Fix several unit tests and increase the unit tests stability (#43670)
      
      * Reduce gather op unit tests size and increase the timeout
      
      * Add NVIDIA_TF32_OVERRIDE for multi-processes environment
      
      * Remove record test for device event ut
      
      * Fix 3 unittest errors (#43532)
      
      * Fix test_fuse_resnet_unit failure
      
      * Fix test_imperative_auto_mixed_precision failure
      
      * Fix sparse_attention_op error
      
      * Fix sparse_attention_op error
      
      * Use fixed random seed (#43659)
      
      * for CI test_collective_sendrecv_api
      Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
      Co-authored-by: NShijie <505749828@qq.com>
      9edbe4aa
    • W
      edff59b1
  10. 23 6月, 2022 5 次提交
  11. 22 6月, 2022 6 次提交
    • C
      Cherry pick 43307 (#43618) · d0bbf46c
      ccrrong 提交于
      * add bilinear_interp_v2 converter
      
      * update op_teller.cc
      
      * add unittest for bilinear_interp_v2 converter
      
      * code format
      
      * bug fix
      
      * code format and add unittest
      
      * remove merged modify in op_teller.cc
      
      * code format
      
      * code format
      
      * fix scale init error
      d0bbf46c
    • X
      gpu_context (#43661) · 90ae3533
      xiaoxiaohehe001 提交于
      90ae3533
    • Y
      Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df
      Yiqun Liu 提交于
      cherry-pick #42750。
      
      QA反馈,#42750 优化后,solov2模型性能可提升6%,故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下,该pr不在release/2.3分支中,故将#42750 中python修改同步到fluid.layers.tensor.linspace中。
      4dcfc6df
    • Z
      [cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2
      Zhang Ting 提交于
      [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)
      
       [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax
      
      cherry-pick #43635 #43681 #43474
      0660d5f2
    • S
      test=document_fix;cherry pick code format check upgrade to release/2.3 (#43732) · 8e6a1945
      Sing_chan 提交于
          Only cherry pick format tool(clang-format, yapf, cmake-format) upgrade to release/2.3, lint tool such as cpplint will not move, because we are not going to fix cpplint error in release/2.3
          pre_commit.sh also is moved to release/2.3 so that both PR-CI-pre-commit and PR-CI-pre-commit-23 can works.
          pre install clang-format to avoid repeat installation due to pre-commit's multi-thread running.
      8e6a1945
    • Z
      fix tensor copy bug (#43299) (#43728) · 8760817a
      zyfncg 提交于
      8760817a
  12. 21 6月, 2022 4 次提交
  13. 20 6月, 2022 1 次提交
  14. 17 6月, 2022 2 次提交
  15. 15 6月, 2022 1 次提交
  16. 14 6月, 2022 1 次提交
    • X
      [ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92
      xiongkun 提交于
      * [EinsumOp] Polish forward logic and backward logic for optimize (#42603)
      
      * change logic for optimize
      
      * modifty
      
      * merge
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)
      
      * [EinsumOp] Make EinsumOp support bfloat16. (#43085)
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0
      
      * make EInsumOP support bf16
      
      * add unittest for BF16
      
      * add condition for test_BF16
      
      * fix bugs
      
      * fix
      
      * change the backward api to fit einsum op
      22e75d92
  17. 09 6月, 2022 1 次提交
  18. 08 6月, 2022 3 次提交
  19. 07 6月, 2022 1 次提交
  20. 06 6月, 2022 1 次提交
    • N
      cherry-pick 42645 (#43205) · 835a1888
      niuliling123 提交于
      删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。
      从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR.
      Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
      835a1888
  21. 31 5月, 2022 1 次提交
    • T
      Del check size (#43113) · 40a7e0ad
      tianshuo78520a 提交于
      删除判断build目录大小和预测库大小检查功能。该功能是和develop比较,会存在差异,在release任务中取消判断
      40a7e0ad