1. 25 7月, 2022 1 次提交
  2. 19 7月, 2022 1 次提交
    • C
      Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190
      chenjian 提交于
      * add serialization for new field in event node (#43405)
      
      * add serialization for new field in event node
      
      * fix a bug
      
      * add more field to memory record (#43578)
      
      * Add infer shape in dygraph (#43822)
      
      * record memory and op supplement info
      
      * update
      
      * update
      
      * fix a bug
      
      * fix memory recording
      
      * fix a bug
      
      * update
      
      * update
      
      * fix a bug
      
      * update
      
      * fix a bug
      
      * fix a bug
      
      * fix a bug
      
      * update dygraph record
      
      * add infer shape record
      
      * fix
      
      * fix
      
      * fix
      
      * add comments
      
      * fix a bug
      
      * fix
      
      * fix
      
      * add record op info
      
      * fix file mode
      
      * add op input shape info
      
      * fix dependency
      a2240190
  3. 12 7月, 2022 1 次提交
  4. 24 6月, 2022 1 次提交
    • A
      [cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa
      Aganlengzi 提交于
      * Use all sitepackages path as the library/include path (#42940)
      
      * Fix several unit tests and increase the unit tests stability (#43670)
      
      * Reduce gather op unit tests size and increase the timeout
      
      * Add NVIDIA_TF32_OVERRIDE for multi-processes environment
      
      * Remove record test for device event ut
      
      * Fix 3 unittest errors (#43532)
      
      * Fix test_fuse_resnet_unit failure
      
      * Fix test_imperative_auto_mixed_precision failure
      
      * Fix sparse_attention_op error
      
      * Fix sparse_attention_op error
      
      * Use fixed random seed (#43659)
      
      * for CI test_collective_sendrecv_api
      Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
      Co-authored-by: NShijie <505749828@qq.com>
      9edbe4aa
  5. 14 6月, 2022 1 次提交
    • X
      [ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92
      xiongkun 提交于
      * [EinsumOp] Polish forward logic and backward logic for optimize (#42603)
      
      * change logic for optimize
      
      * modifty
      
      * merge
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)
      
      * [EinsumOp] Make EinsumOp support bfloat16. (#43085)
      
      * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0
      
      * make EInsumOP support bf16
      
      * add unittest for BF16
      
      * add condition for test_BF16
      
      * fix bugs
      
      * fix
      
      * change the backward api to fit einsum op
      22e75d92
  6. 10 5月, 2022 2 次提交
  7. 09 5月, 2022 1 次提交
  8. 04 5月, 2022 1 次提交
  9. 22 4月, 2022 1 次提交
  10. 21 4月, 2022 2 次提交
  11. 20 4月, 2022 1 次提交
    • Y
      [Phi] Support construct Scalar by using Non-CPU Tensor (#41765) (#41963) · 3b25afb2
      YuanRisheng 提交于
      * support construct scalar using non-cpu tensor
      
      * fix bugs when run unittest
      
      * fix compile bugs
      
      * fix bugs when run ci
      
      * fix compile bugs
      
      * fix bugs when move copy
      
      * perfect unit test
      
      * perfect unittest
      
      * update according to comment
      
      * add target dependency
      
      * deal with conflict
      
      * fix bugs when run unit test
      
      * fix unit test bugs
      3b25afb2
  12. 19 4月, 2022 2 次提交
  13. 15 4月, 2022 1 次提交
  14. 13 4月, 2022 1 次提交
  15. 12 4月, 2022 3 次提交
  16. 08 4月, 2022 2 次提交
  17. 06 4月, 2022 1 次提交
  18. 03 4月, 2022 1 次提交
    • F
      add maximum limit for grid of index_select (#41127) · af8d2482
      FlyingQianMM 提交于
      * limit grid dim for index select
      
      * mv LimitGridDim into gpu_launch_config.h
      
      * fix conflicts
      
      * fix conflicts
      
      * fix code style
      
      * set block to 256
      
      * fix grid setting
      
      * set dtype of block_dim to unsigned int
      af8d2482
  19. 01 4月, 2022 2 次提交
    • W
      [Eager] Support pinned (#41035) · f3270fc8
      wanghuancoder 提交于
      * support pinned, test=develop
      
      * support async_write, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      f3270fc8
    • z8hanghuan's avatar
      support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897
      z8hanghuan 提交于
      * support multi_layer of bilstm,*test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      00d23897
  20. 31 3月, 2022 3 次提交
    • L
      [new-exec] fit mkldnn op (#41058) · 02cf6764
      Leo Chen 提交于
      * fix bug that some op has no op_role attr
      
      * add mkldnn support for new executor
      
      * fit for mkldnn data_transfer
      
      * fit for mkldnn data_transfer
      02cf6764
    • C
      Maintain old profiler (#41132) · a6bf2218
      chenjian 提交于
      * no
      
      * maintain old profiler
      
      * exclude new python record events for old profiler
      
      * maintain old profiler
      
      * maintain
      
      * maintain old profiler
      
      * maintain
      
      * fix cmakes
      a6bf2218
    • C
      Add time range duration display (#41029) · 6744754f
      chenjian 提交于
      * no
      
      * fix bugs
      
      * fix doc according to review
      
      * fix api doc format
      
      * fix api doc according to review
      
      * fix bug and add unit test
      
      * fix record event bug
      
      * optimize chrome tracing display
      
      * fix bug
      
      * add comment
      
      * add unit test
      
      * fix a bug
      
      * fix
      
      * fix
      
      * fix format
      6744754f
  21. 30 3月, 2022 3 次提交
  22. 29 3月, 2022 1 次提交
  23. 28 3月, 2022 1 次提交
    • C
      Fix profiler package bug (#40888) · 77a455c7
      chenjian 提交于
      * no
      
      * fix bugs
      
      * fix doc according to review
      
      * fix api doc format
      
      * fix api doc according to review
      
      * fix bug and add unit test
      
      * fix record event bug
      77a455c7
  24. 27 3月, 2022 1 次提交
  25. 25 3月, 2022 2 次提交
  26. 23 3月, 2022 3 次提交
    • F
      [NPU] add npu support for conv3d and conv3d_grad (#38480) · ff568afa
      furnace 提交于
      * [NPU] add npu support for conv3d and conv3d_grad
      
      * [NPU] delete failed unittests due to Ascend not support
      
      * [NPU] delete debug codes
      
      * [NPU] optimize codes, notest
      
      * [NPU] remove const_cast
      
      * [NPU] optimize for remove const_cast
      
      * [NPU] fix written errors
      ff568afa
    • F
      Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988
      From00 提交于
      * Performance optimize
      
      * Optimize GetAllocator, RWLock and ProcessUnfreedAllocation
      
      * Remove test file
      
      * Fix CI error
      
      * Fix CI errors
      
      * Fix CI errors
      d8bff988
    • C
      Add profiler features (#40357) · c15e3823
      chenjian 提交于
      * add event record for model profiling
      
      * fix format
      
      * fix format
      
      * fix code example bug
      
      * no
      
      * add profiler statistic
      
      * add profiler feature
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * required: gpu
      
      * required: gpu
      
      * fix bug
      
      * required: gpu
      
      * fix ci bug
      
      * fix ci error
      
      * fix ci error
      
      * upgrade document
      
      * fix doc
      
      * fix ci bug
      
      * add doc and fix bug
      
      * nothing
      
      * fix bug
      
      * fix format bug
      
      * modify format
      
      * add deprecated description for old profiler
      
      * fix bug
      
      * fix bug
      
      * fix
      
      * add load_profiler_reuslt doc
      
      * add load_profiler_reuslt doc
      
      * add load_profiler_reuslt doc
      
      * help fix old profiler sample code
      
      * add api doc
      
      * fix format
      
      * fix api doc
      
      * fix api doc format
      
      * fix api doc format
      
      * fix api doc c format
      
      * fix api doc format
      c15e3823