1. 22 4月, 2022 1 次提交
    • M
      [WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72
      Ming-Xu Huang 提交于
      * Fix leading dimension setting error in fused_gemm_epilogue_grad_op.
      
      * Add dyload to cuBlasLt functions.
      
      * Added cublasLtMatmulAlgoGetHeuristic to improve performance.
      
      * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue
      
      * Added UTs to FLAGS_cublaslt_exhaustive_search_times
      
      * Added warmup runs in algo searching of Gemm epilogue.
      
      * Update copyright and documents.
      
      * Fixed error handling.
      19650d72
  2. 20 4月, 2022 1 次提交
  3. 19 4月, 2022 3 次提交
  4. 18 4月, 2022 3 次提交
  5. 15 4月, 2022 3 次提交
  6. 14 4月, 2022 4 次提交
  7. 13 4月, 2022 5 次提交
  8. 12 4月, 2022 3 次提交
  9. 11 4月, 2022 2 次提交
  10. 09 4月, 2022 1 次提交
  11. 08 4月, 2022 2 次提交
  12. 07 4月, 2022 5 次提交
  13. 06 4月, 2022 1 次提交
  14. 03 4月, 2022 1 次提交
    • F
      add maximum limit for grid of index_select (#41127) · af8d2482
      FlyingQianMM 提交于
      * limit grid dim for index select
      
      * mv LimitGridDim into gpu_launch_config.h
      
      * fix conflicts
      
      * fix conflicts
      
      * fix code style
      
      * set block to 256
      
      * fix grid setting
      
      * set dtype of block_dim to unsigned int
      af8d2482
  15. 01 4月, 2022 2 次提交
    • W
      [Eager] Support pinned (#41035) · f3270fc8
      wanghuancoder 提交于
      * support pinned, test=develop
      
      * support async_write, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      f3270fc8
    • z8hanghuan's avatar
      support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897
      z8hanghuan 提交于
      * support multi_layer of bilstm,*test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      
      * support multi_layer of bilstm, *test=kunlun
      00d23897
  16. 31 3月, 2022 3 次提交
    • L
      [new-exec] fit mkldnn op (#41058) · 02cf6764
      Leo Chen 提交于
      * fix bug that some op has no op_role attr
      
      * add mkldnn support for new executor
      
      * fit for mkldnn data_transfer
      
      * fit for mkldnn data_transfer
      02cf6764
    • C
      Maintain old profiler (#41132) · a6bf2218
      chenjian 提交于
      * no
      
      * maintain old profiler
      
      * exclude new python record events for old profiler
      
      * maintain old profiler
      
      * maintain
      
      * maintain old profiler
      
      * maintain
      
      * fix cmakes
      a6bf2218
    • C
      Add time range duration display (#41029) · 6744754f
      chenjian 提交于
      * no
      
      * fix bugs
      
      * fix doc according to review
      
      * fix api doc format
      
      * fix api doc according to review
      
      * fix bug and add unit test
      
      * fix record event bug
      
      * optimize chrome tracing display
      
      * fix bug
      
      * add comment
      
      * add unit test
      
      * fix a bug
      
      * fix
      
      * fix
      
      * fix format
      6744754f