1. 28 8月, 2023 1 次提交
    • S
      [Phi] move shuffle_batch to phi (#56547) · 30708028
      Sonder 提交于
      * move shuffle_batch to phi
      
      * remove useless codes
      
      * add test_shuffle_batch_op to STATIC_BUILD_TESTS
      
      * move shuffle_batch_kernel.cc to cpu folder
      
      * move shuffle_batch_grad to phi
      
      * rm shuffle_batch_op.h
      
      * change year at file head
      30708028
  2. 25 8月, 2023 1 次提交
  3. 24 8月, 2023 2 次提交
  4. 23 8月, 2023 2 次提交
  5. 22 8月, 2023 2 次提交
  6. 21 8月, 2023 3 次提交
  7. 16 8月, 2023 3 次提交
  8. 15 8月, 2023 5 次提交
  9. 14 8月, 2023 2 次提交
  10. 10 8月, 2023 1 次提交
  11. 09 8月, 2023 1 次提交
  12. 08 8月, 2023 3 次提交
  13. 07 8月, 2023 3 次提交
    • Y
      Add attn_mask supported for FlashAttnKernel. (#55969) · 42e0c6b8
      yin wei 提交于
      * add mask
      
      * add backword
      
      * add enforce info
      
      * update scale
      
      * integrate code
      
      * update enforce
      
      * add enforce eq
      
      * add error type
      
      * update enforce
      
      * add test_flash_attention
      
      * Polish codes and fix compiling errors.
      
      * Set num_splits to 0 for flash-attn with tensor mask.
      
      * Fix the compiling error for non flash-attn case.
      
      ---------
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      42e0c6b8
    • R
      30a02d27
    • U
      [WIP] Integration flash attention 2 (#55758) · 0473369f
      umiswing 提交于
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      0473369f
  14. 04 8月, 2023 1 次提交
    • K
      [NewIR] Rename feed with place to data (#55778) · 274e5e54
      kangguangli 提交于
      * fix bug: feed_with_place should consider variable existence
      
      * fix
      
      * fix build scope
      
      * change method to set feed var name
      
      * remove feed_with_place to placeholder
      
      * fix
      
      * rename to data
      
      * fix
      
      * fix
      274e5e54
  15. 03 8月, 2023 2 次提交
  16. 02 8月, 2023 3 次提交
    • Y
      [Inference] Replace groupNorm when data types are bf16 and fp16, and data... · e61d892a
      yangjianfengo1 提交于
      [Inference] Replace groupNorm when data types are bf16 and fp16, and data format is NHWC implementation. (#55399)
      
      * finish
      
      * cpergroup odd
      
      * fix bf16
      
      * single channel
      
      * code style
      
      * jingdu duiqi
      
      * add head_file
      
      * add bf16 head file
      
      * bf16 2
      
      * bf16
      
      * bf16 head
      
      * bf16 compile
      
      * py test
      
      * bf16 compile
      
      * bf16 compile
      
      * unset py test
      
      * nhwc
      
      * test
      
      * mean var
      
      * bf16 success
      
      * su
      
      * ctest success
      
      * use is_same_as
      
      * is_same
      
      * use is_same
      
      * rtol
      
      * gpu_stream
      
      * del sigmod
      
      * fix bfloat16 type
      
      * use cuda_bf16_hpp
      
      * use_cuda_arch
      
      * bfloat162float2
      
      * del inplace_tol
      
      * del max_releative_tol
      
      * temp store
      
      * jingdu duiqi
      
      * temp store
      
      * plugin
      
      * jingdu duiqi
      
      * duiqi
      
      * include cuda.h
      
      * del half
      
      * half single
      
      * ci
      
      * add const
      
      * ci
      
      * cudamemset
      
      * del printf
      
      * fp16 test
      
      * add half compute
      
      * del br16 ci
      
      * del ci
      
      * ci approve
      
      * del fluid include
      e61d892a
    • C
      Add FP16 & BF16 for erfinv (#55287) · 6d7efd09
      cyberslack_lee 提交于
      6d7efd09
    • W
      fix security bug (#55782) · 19da5c0c
      wanghuancoder 提交于
      * fix security bug
      19da5c0c
  17. 01 8月, 2023 3 次提交
  18. 31 7月, 2023 2 次提交