1. 10 8月, 2023 2 次提交
    • L
      Add variable_length_memory_efficient_attention (#55400) · 4036c937
      lzy 提交于
      * add variable_length_memory_efficient_attention
      * update variable_length_memory_efficient_attention unittest
      * update variable_length_mem_eff_attn's docs and unittest
      * update variable_length_mem_eff_attn's docs
      * Update test_variable_length_memory_efficient_attention.py
      * Update variable_length_memory_efficient_attention.cu
      * fix codestyle
      * fix variable_length_fmha's docs and unittest
      * fix variable_length_fmha's docs
      4036c937
    • Y
      fix A100 fused linear grad add ut bug (#56136) · b561a05e
      Yuang Liu 提交于
      b561a05e
  2. 09 8月, 2023 6 次提交
    • X
      [Paddle Inference] Set softmax op use_cudnn default true. (#56036) · 4f2cf7fb
      xiaoxiaohehe001 提交于
      * fix_softmax_eigen
      
      * fix_ctest_seresnet
      
      * fix_ci_error
      4f2cf7fb
    • C
      Add FP16 & BF16 for nanmedian (#56056) · 4ae9945b
      cyberslack_lee 提交于
      4ae9945b
    • U
      Fix select sdp for FA-2 (#56045) · 08e46d6f
      umiswing 提交于
      08e46d6f
    • N
      change index's dtype for int to int64 (#55949) · 8d181e37
      niuliling123 提交于
      8d181e37
    • K
      [NewIR] minor fix about new ir test (#56075) · a127d7c8
      kangguangli 提交于
      * fix bugs about new ir test
      
      * enable dy2st newir test in all cases
      
      * fix
      a127d7c8
    • L
      remove the... · 723c6f77
      LoneRanger 提交于
      remove the AdamOptimizer、SGDOptimizer、MomentumOptimizer、ModelAverage、LookaheadOptimizer、FtrlOptimizer、DecayedAdagradOptimizer、DpsgdOptimizer in fluid and relocate the ExponentialMovingAverage、PipelineOptimizer、GradientMergeOptimizer and change optimizer base for LarsMomentumOptimizer and RecomputeOptimizer (#55970)
      
      * change the optimizer base for SGDOptimizer
      
      * change the optimizer base for SGDOptimizer
      
      * replace the SGDOptimizer with SGD
      
      * fix bug of sgd
      
      * change the optimizer base for MomentumOptimizer
      
      * fix the remaining tests
      
      * remove the Momentum in fluid/optimizer.py
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * Update test_resnet_cinn.py
      
      * Update test_resnet_prim_cinn.py
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * remove the ModelAverage in fluid
      
      * remove the LookaheadOptimizer in fluid
      
      * fix bug
      
      * remove AdamOptimizer in fluid
      
      * Update test_image_classification_fp16.py
      
      * fix bug
      
      * relocate the ExponentialMovingAverage in fluid
      
      * restore the static api
      
      * remove the FtrlOptimizer in fluid
      
      * remove the DecayedAdagradOptimizer in fluid
      
      * remove the DpsgdOptimizer in fluid
      
      * fix bug
      
      * fix codestyle
      
      * fix bug
      
      * fix bug
      
      * relocate the PipelineOptimizer
      
      * relocate the GradientMergeOptimizer
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix doc
      
      * Update __init__.py
      
      * Update test_fleet_qat_meta_optimizer.py
      
      * change optimizer base for LarsMomentumOptimizer
      
      * fix bug
      
      * fix conflict
      
      * fix code-style
      
      * fix sample codes
      
      * fix bug
      
      * fix bug
      
      * fix cinn bug
      
      * fix bug
      
      * fix bug
      
      * Update qat_optimizer.py
      
      * Update __init__.py
      
      * fix bug
      
      * change optimizer base for RecomputeOptimizer
      
      * fix bug
      
      * fix bug
      
      * Update test_imperative_optimizer_v2.py
      723c6f77
  3. 08 8月, 2023 3 次提交
  4. 07 8月, 2023 6 次提交
    • Y
      Add attn_mask supported for FlashAttnKernel. (#55969) · 42e0c6b8
      yin wei 提交于
      * add mask
      
      * add backword
      
      * add enforce info
      
      * update scale
      
      * integrate code
      
      * update enforce
      
      * add enforce eq
      
      * add error type
      
      * update enforce
      
      * add test_flash_attention
      
      * Polish codes and fix compiling errors.
      
      * Set num_splits to 0 for flash-attn with tensor mask.
      
      * Fix the compiling error for non flash-attn case.
      
      ---------
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      42e0c6b8
    • C
      Fix typos (#56008) · 4d094b0c
      co63oc 提交于
      4d094b0c
    • T
      Test Del paddle_bfloat (#55904) · 8fc2366c
      tianshuo78520a 提交于
      * Test Del paddle_bfloat
      
      * Del paddle_bfloat test
      8fc2366c
    • Y
      Increase absolute error of test_group_norm_op (#55992) · 496de7f3
      yangjianfengo1 提交于
      * inplace tol
      
      * code style
      496de7f3
    • C
      Fix typos, test=document_fix (#56005) · f55f601e
      co63oc 提交于
      f55f601e
    • U
      [WIP] Integration flash attention 2 (#55758) · 0473369f
      umiswing 提交于
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      0473369f
  5. 04 8月, 2023 4 次提交
    • D
      repacle embedding in fluid with 2.0 version (#55757) · 2d91a9bd
      Difer 提交于
      * replace embedding
      
      * replace sparse_embedding
      
      * fix some bugs
      
      * del embedding
      
      * repalce layers.embedding
      
      * fix type error
      2d91a9bd
    • H
      [NewIR]New ir aot placement refactor (#55810) · dd1379e4
      hong 提交于
      * refacot aot
      
      * update
      
      * fix bugs
      
      * remove some test
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * update
      dd1379e4
    • J
      Support Combined indexing for __getitem__ and __setitem__ (#55211) · 697c712f
      JYChen 提交于
      * WIP: start writing combined indexing get
      
      * list/tuple/Variable
      
      * getitem 80%
      
      * add setitem
      
      * add some unittest for setitem
      
      * lazy import
      
      * fix some setitem error
      
      * fix advance indexing with decreasing axes; fix strided_slice input name
      
      * combine int-tensor getitem is ok (without boolean support & broadcast); add getitem unittest for static
      
      * add broadcast & parse bool tensor for __getitem
      
      * [change getitem] _getitem_impl_ to _getitem_static, not deleting the former one
      
      * refine new getitem; fix ut in variable/var_base
      
      * add __getitem__ ut in dygraph
      
      * re-dispatch getitem for Py/CPP; fix strided_slice decrease axes error in dygraph
      
      * fix ut; support tensor in slice
      
      * [change setitem] _setitem_impl_ to _setitem_static, not deleting the former one
      
      * remove some UT (for some, temporarily)
      
      * add IndexError to solve timeout problem in static-mode
      
      * 1.temply forbideen all-False bool-indexput; 2.setitem_static will return new variable
      
      * xpu uses old stratege
      
      * rename dy2st setitem ut to avoid same-name problem
      
      * dy2st for new combined index
      
      * ut case for combine-index with dy2st
      
      * open ut with all-false-bool setitem
      
      * remove useless doc and _getitem_impl_
      
      * change static res
      
      * fix static xpu
      697c712f
    • L
  6. 03 8月, 2023 2 次提交
  7. 02 8月, 2023 5 次提交
  8. 01 8月, 2023 1 次提交
  9. 31 7月, 2023 7 次提交
  10. 30 7月, 2023 1 次提交
  11. 28 7月, 2023 3 次提交