1. 09 8月, 2023 2 次提交
  2. 08 8月, 2023 1 次提交
  3. 07 8月, 2023 2 次提交
    • U
      [cherry-pick] Integration flash attention 2 (#56015) · cc9a7688
      umiswing 提交于
      * [FlashAttn] add flash randomness control (#52902)
      
      * add flash randomness control
      
      * fix VLOG undefied
      
      * [WIP] Integration flash attention 2 (#55758)
      
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      
      ---------
      Co-authored-by: NChitsing KUI <kuizhiqing@msn.com>
      cc9a7688
    • N
      cherry-pick fused_rope from develop (#55931) · 8d3a9882
      niuliling123 提交于
      * Add fused_rope forward op (#54351)
      
      * style
      
      * more
      
      * update ctest
      
      * Update legacy_backward.yaml
      
      * Update legacy_ops.yaml
      
      * Update legacy_ops.yaml
      
      * update
      
      * update
      
      * update for move
      
      * Update the rope op according to the comments (#54985)
      
      * Update multiary.cc
      
      * Update __init__.py
      
      * for int64_t and assert
      
      * more
      
      * remove useless assert first
      
      ---------
      Co-authored-by: Nsneaxiy <sneaxiy@126.com>
      8d3a9882
  4. 22 7月, 2023 1 次提交
  5. 18 7月, 2023 1 次提交
  6. 13 7月, 2023 2 次提交
  7. 04 7月, 2023 1 次提交
  8. 29 6月, 2023 1 次提交
  9. 14 6月, 2023 1 次提交
    • P
      support sharding stage1 (#54069) · 974676bc
      pangengzheng 提交于
      * support sharding stage1
      
      * fix unittest
      
      * format
      
      * pass sharded sharding params_and_grads to inner_opt apply_pptimize
      
      * change sharding gradient allreduce to reduce
      
      * support save state_dict adptively and support sharding with mp
      
      * fix sharding test
      
      * test set_state_dict
      
      * add more unit test
      
      * fix global norm of mp case
      
      * polish
      
      * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp
      
      * remove print
      974676bc
  10. 08 6月, 2023 3 次提交
  11. 29 5月, 2023 1 次提交
  12. 26 5月, 2023 1 次提交
  13. 23 5月, 2023 3 次提交
  14. 19 5月, 2023 1 次提交
  15. 15 5月, 2023 3 次提交
  16. 14 5月, 2023 1 次提交
  17. 12 5月, 2023 1 次提交
  18. 11 5月, 2023 1 次提交
  19. 29 4月, 2023 1 次提交
  20. 28 4月, 2023 1 次提交
  21. 27 4月, 2023 1 次提交
  22. 26 4月, 2023 4 次提交
  23. 25 4月, 2023 1 次提交
  24. 24 4月, 2023 1 次提交
  25. 21 4月, 2023 1 次提交
  26. 17 4月, 2023 3 次提交