1. 30 4月, 2021 9 次提交
  2. 29 4月, 2021 10 次提交
  3. 28 4月, 2021 8 次提交
    • L
      [NPU] add input EpsilonTensor for adam (#32605) · 119cda3d
      Leo Chen 提交于
      * add input EpsilonTensor for adam
      
      * update python api
      
      * add unit test
      
      * add npu test
      
      * add more ut
      119cda3d
    • A
      Added pure_bf16 mode (#32281) · bc379ca3
      arlesniak 提交于
      bc379ca3
    • D
      Nne integration (#32604) · abcb3f54
      denglin-github 提交于
      * Add dlnne engine runtime
      
      * Fix log
      
      * Remove <const_cast> and remove unrelated modify with dlnne, +clang-format
      
      * Fix CMakeList format error
      
      * Add copyright message
      
      * Fix dlnne CMakeList.txt
      
      * Add some paddlepaddle_pass to support more networks
      
      * Fix some format bug
      
      * Add delete dropout_op pass
      
      * Fix some format bug
      
      * Fix format bug
      abcb3f54
    • T
      [PsCore] solve Brpc dep (#32632) · 4ead9a5a
      Thunderbrook 提交于
      * Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)"
      
      This reverts commit 809ac036.
      
      * brpc dep
      4ead9a5a
    • K
      Fix some error message (#32614) · 9ee709fc
      Kqnonrime 提交于
      * fix two error message
      
      * fix two error message
      
      * fix error
      
      * fix error
      
      * fix error
      
      * fix error
      
      * fix some error message
      
      * fix some error
      
      * fix error
      
      * fix some error
      
      * fix some error
      
      * fix some error
      
      * fix one error
      
      * fix some error
      
      * fix seven error message
      
      * fix error
      
      * fix error
      
      * fix error
      
      * fix error
      
      * fix some error message
      
      * fix error
      
      * fix some error
      
      * fix some error
      9ee709fc
    • Z
      [Rocm] fix test_var_base (#32639) · 7a245b7a
      zhulei 提交于
      7a245b7a
    • J
      [oneDNN] Added clearing oneDNN cache per executor (#32499) · ba610761
      Jacek Czaja 提交于
      * - Added clearing oneDNN per executor
      
      * - Executor is nt always having FLAGS_use_mkldnn set to true
      ba610761
    • J
      Optimize update_loss_scaling_op (#32554) · 0dc02dc7
      jiangcheng 提交于
      * optimize update_loss_scaling_op by fused for loop to one kernel, test=develop
      
      * remove useless while loop and optimize variable name, test=develop
      
      * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop
      
      * optimize variable name for readable by change prefix identifier from t_ to local_
      0dc02dc7
  4. 27 4月, 2021 10 次提交
  5. 26 4月, 2021 3 次提交
    • L
      add send/recv api (#32504) · c47bafc6
      lilong12 提交于
      * add sendrecv, test=develop
      c47bafc6
    • J
      Optimize where_index_op(prefix sum) (#30601) · 6ec4e640
      jiangcheng 提交于
      * new optimize for where_index_op with prefix sum version.
      
      * write a scan prefix sum kernel with stream for where index op.
      
      * optimize where_index by using cub::DeviceScan::InclusiveSum instead of imperfect self-kernel.
      
      * remove CheckTrue struct and rename stide_array for readable.
      
      * optimize variable name for readable.
      
      * optimize function name and annotation.
      6ec4e640
    • T
      [PsCore] optimize performance of large kv (#32535) · 4b7242b0
      Thunderbrook 提交于
      * optimize pull sparse
      
      * optimize pull sparse
      
      * change macro
      
      * format
      4b7242b0