1. 23 5月, 2023 6 次提交
  2. 22 5月, 2023 25 次提交
  3. 20 5月, 2023 3 次提交
  4. 19 5月, 2023 6 次提交
    • S
      [Inference] Save optimized model by pass (#53696) · fa08a514
      shentanyue 提交于
      fa08a514
    • F
      Improve stablity of Paddle-TensorRT FP16 UT GitHub (1) (#51554) · 645e81f0
      Frank Lin 提交于
      * Improve Readability and Overall Clarity of Logging
      
      * Adds the set_input_type API for specifying input data types
      
      * Specifying input data types
      645e81f0
    • W
      [XPU] fix fallback (#53801) · 4b85e5db
      wz1qqx 提交于
      4b85e5db
    • warrentdrew's avatar
      add minimum grad composite rules (#52561) · 97690816
      warrentdrew 提交于
      * add minimum grad composite rules
      
      * add public python api
      
      * fix format
      
      * fix format
      
      * update testcase
      
      * fix testcase
      
      * fix format
      
      * fix cmakelist.txt
      
      * fix format
      
      * fix param problem
      
      * fix op and composite rule
      
      * fix bf16 cpu support problem
      
      * fix bf16 cpu issue
      
      * fix axis error log
      
      * add axis for maximum
      
      * revert commit
      
      * remove .orig
      
      * fix generic problem
      
      * revert max op
      
      * fix axis error
      
      * fix maximum axis
      
      * fix test_check_output
      
      * fix cinn
      
      * fix minimum maximum axis check
      97690816
    • 9d9f0ce5
    • L
      Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e
      limingshu 提交于
      * Reorganize the forward codes of flash-attention.
      
      * Fix forward.
      
      * Remove some noused codes.
      
      * Simplify codes and fix backward.
      
      * Change all LOG(INFO) to VLOG and fix the backward.
      
      * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes
      
      * decrease the effect of debug print on performance
      
      * Unify the initialize of flashattn arguments.
      
      * Rewirte the reshape of temp_mask and temp_bias.
      
      * API support use_flash_attn.
      
      * Fix compiling error on CI.
      
      * Try to crop the flash-attention lib.
      
      * Correct the condition of whether can use flash-attn.
      
      * Remove the softmax_out argument.
      
      * Remove is_causal.
      
      * Polish codes.
      
      * Fix qkv_transpose_out's shape and scaling of Q * K.
      
      * Update commit of flash-attention.
      
      ---------
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      d29c1f8e