1. 07 8月, 2023 1 次提交
    • U
      [WIP] Integration flash attention 2 (#55758) · 0473369f
      umiswing 提交于
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      0473369f
  2. 05 6月, 2023 1 次提交
  3. 19 5月, 2023 1 次提交
    • L
      Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e
      limingshu 提交于
      * Reorganize the forward codes of flash-attention.
      
      * Fix forward.
      
      * Remove some noused codes.
      
      * Simplify codes and fix backward.
      
      * Change all LOG(INFO) to VLOG and fix the backward.
      
      * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes
      
      * decrease the effect of debug print on performance
      
      * Unify the initialize of flashattn arguments.
      
      * Rewirte the reshape of temp_mask and temp_bias.
      
      * API support use_flash_attn.
      
      * Fix compiling error on CI.
      
      * Try to crop the flash-attention lib.
      
      * Correct the condition of whether can use flash-attn.
      
      * Remove the softmax_out argument.
      
      * Remove is_causal.
      
      * Polish codes.
      
      * Fix qkv_transpose_out's shape and scaling of Q * K.
      
      * Update commit of flash-attention.
      
      ---------
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      d29c1f8e
  4. 20 4月, 2023 1 次提交
  5. 29 3月, 2023 1 次提交
  6. 01 3月, 2023 1 次提交
    • C
      Integration flash attention (#49869) · 61611786
      Chitsing KUI 提交于
      * flash attn
      
      * seed
      
      * almost
      
      * softmax
      
      * fix workspace
      
      * add unitest; linux only
      
      * fix setup
      
      * fix datatype include
      
      * fix setup typo
      
      * fix def scope
      
      * new error api
      
      * use paddle fork
      
      * fix attr bug; complete ut
      
      * update flash hash
      
      * fix rng reset
      
      * fix offset
      
      * fix comments
      61611786