1. 09 8月, 2023 2 次提交
  2. 08 8月, 2023 1 次提交
  3. 07 8月, 2023 2 次提交
    • U
      [cherry-pick] Integration flash attention 2 (#56015) · cc9a7688
      umiswing 提交于
      * [FlashAttn] add flash randomness control (#52902)
      
      * add flash randomness control
      
      * fix VLOG undefied
      
      * [WIP] Integration flash attention 2 (#55758)
      
      * Work for fa-2 padded fwd. Code to be cleaned.
      
      * Work for fa2 unpadded fwd.
      
      * Work for padded-bwd, dk get small diff on np.random.seed(0)
      
      * Anyway I pass paddle's utest, except return softmax without dropout.
      
      * Clean code.
      
      * Modify interface.
      
      * Clean code and add some check.
      
      * Easy compile for dev.
      
      * Fix ci.
      
      * Fix ci-build.
      
      * Add std c++17 option again.
      
      * Limit max job when compiling fa2.
      
      * Remove const_cast
      
      * Add fwd params, to be cleaned.
      
      * Clean code.
      
      * Add bwd params.
      
      * Clean code.
      
      * Add enforce.
      
      * Use v2.0.4
      
      * Pass RNG state to fa2 capi
      
      * Fix review.
      
      * Add assert
      
      * Skip compile for sm less than 80.
      
      ---------
      Co-authored-by: NChitsing KUI <kuizhiqing@msn.com>
      cc9a7688
    • N
      cherry-pick fused_rope from develop (#55931) · 8d3a9882
      niuliling123 提交于
      * Add fused_rope forward op (#54351)
      
      * style
      
      * more
      
      * update ctest
      
      * Update legacy_backward.yaml
      
      * Update legacy_ops.yaml
      
      * Update legacy_ops.yaml
      
      * update
      
      * update
      
      * update for move
      
      * Update the rope op according to the comments (#54985)
      
      * Update multiary.cc
      
      * Update __init__.py
      
      * for int64_t and assert
      
      * more
      
      * remove useless assert first
      
      ---------
      Co-authored-by: Nsneaxiy <sneaxiy@126.com>
      8d3a9882
  4. 18 7月, 2023 1 次提交
  5. 13 7月, 2023 2 次提交
  6. 29 6月, 2023 1 次提交
  7. 14 6月, 2023 1 次提交
    • P
      support sharding stage1 (#54069) · 974676bc
      pangengzheng 提交于
      * support sharding stage1
      
      * fix unittest
      
      * format
      
      * pass sharded sharding params_and_grads to inner_opt apply_pptimize
      
      * change sharding gradient allreduce to reduce
      
      * support save state_dict adptively and support sharding with mp
      
      * fix sharding test
      
      * test set_state_dict
      
      * add more unit test
      
      * fix global norm of mp case
      
      * polish
      
      * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp
      
      * remove print
      974676bc
  8. 29 5月, 2023 1 次提交
  9. 26 5月, 2023 1 次提交
  10. 19 5月, 2023 1 次提交
  11. 15 5月, 2023 3 次提交
  12. 14 5月, 2023 1 次提交
  13. 12 5月, 2023 1 次提交
  14. 29 4月, 2023 1 次提交
  15. 26 4月, 2023 2 次提交
  16. 25 4月, 2023 1 次提交
  17. 21 4月, 2023 1 次提交
  18. 14 4月, 2023 17 次提交
    • Z
      [AMP OP&Test] Cumprod support fp16 and bf16 (#52919) · 8a850af6
      Zhang Zheng 提交于
      8a850af6
    • C
      【Hackathon4 No58】logcumsum logsum (#51275) · 468869e4
      cyberslack_lee 提交于
      468869e4
    • C
      【Hackathon4 No58】kthvalue (#51615) · 43efb979
      cyberslack_lee 提交于
      43efb979
    • C
      【Hackathon No.62】digamma, dirichlet算子FP16/BF16单测完善 (#52604) · 7ecbcc08
      chenxujun 提交于
      * Add digamma, dirichlet tests
      
      * Fix code
      7ecbcc08
    • S
      【Hackathon No.55】add erf FP16 test and BF16 test (#52136) · eeb4d165
      superwinner1 提交于
      * add erf FP16 test
      eeb4d165
    • C
      Add angle,bmm tests (#52630) · 6d7ee668
      chenxujun 提交于
      6d7ee668
    • U
      [Dcu]: Add rocsparse_spmm for dcu. (#52200) · 281ea2f4
      umiswing 提交于
      281ea2f4
    • Y
      [Zero-Dim] support 0-D tensor for... · 6f41e177
      YangQun 提交于
      [Zero-Dim] support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion onednn kernels (#52185)
      
      * support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion ops
      
      * fix gaussian random mkldnn op ut
      6f41e177
    • H
      [Decouple enforce.h] Move LOG from enforce.h to enforce.cc (#52883) · b33f95b0
      HongyuJia 提交于
      * [Decouple enforce.h] Move LOG from enforce.h to enforce.cc
      
      * update cmake of device_context.cc, solve cuda_device_context_allocator.h compile error
      
      * add namespace inside macro
      b33f95b0
    • F
      1. modify set_value op, use Scalars to represent attr `values`, instead of a... · dd2a749a
      Feiyu Chan 提交于
      1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408)
      
      2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition);
      3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version;
      3. provide an option `legacy_format=false` in  serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute;
      4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.
      dd2a749a
    • G
      [phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221
      gouzil 提交于
      * [phi] move sequence_pool kernel to phi
      
      * [phi] mv sequence_pooling to phi funcs
      
      * [phi] mv sequence_pooling_test
      
      * [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`
      
      * [phi][funcs] fix mutable_data
      
      * [phi][funcs] fix mutable_data
      b281b221
    • S
      Move fused_attention op to phi [迁移反向 GPU OpKernel] (#51909) · 3bac6264
      Sonder 提交于
      * add kernel functions
      
      * update kernel functions
      
      * update func parameters' name
      
      * create codes for gpu device
      
      * 调整文件位置
      
      * fix include error
      
      * remove dependent files to phi/
      
      * restore fused_attention_op.cu
      
      * fix dependence errors
      
      * fix dependence errors
      
      * fix include error
      
      * fix all depandence errors[build success]
      
      * remove useless include
      
      * recover useless include
      
      * use phi::ToNCCLDataType
      
      * fix namespace
      
      * update new register code
      
      * fix error in fused_gemm_epilogue_utils
      
      * fix error in FusedAttentionKernel parm
      
      * finish fused_attention registe code[build success]
      
      * add paddle::optional
      
      * add sig file
      
      * fix build error
      
      * fix a include error
      
      * 恢复正向代码
      
      * update CMkaeList
      
      * trans Compute function to phi [build success]
      
      * add register code and fix include error [build success]
      
      * fix parameter sequence
      
      * add include file
      
      * update #if before include
      
      * update #if before include
      
      * fix grammly error
      
      * update codes for DropoutParam
      
      * remove const cast
      
      * trans some fluid api to phi api
      
      * remove const cast
      
      * trans some fluid api to phi api
      
      * add #if
      
      * update test code
      
      * update test codes
      
      * recover test codes
      
      * fix namespace and remove fluid include
      
      * recover random seed
      
      * remove fluid quant_helper
      
      * fix include error
      
      * include utils in funcs
      
      * change include file
      
      * move grad codes back to fluid floder
      
      * move grad codes back to fluid floder
      
      * fix sig file error
      
      * update include
      
      * recover codes to develop
      
      * update register codes
      
      * fix build error
      
      * recover fluid include
      
      * remove some fluid include
      
      * remove some fluid include
      
      * Update fused_attention_op.cu
      
      * remove fluid include
      
      * add some fluid include
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * remote useless include
      3bac6264
    • G
      fix some [-Wunused-function] and [-Wunused-function] warning (#52868) · ab163063
      Galaxy1458 提交于
      * test,test=develop
      
      * test,test=develop
      
      * test,test=develop
      ab163063
    • L
      add backend config to select kernel (#52907) · 1ab7e77a
      lzydev 提交于
      1ab7e77a
    • S
      fix win cu116 compile error (#52894) · 60ba559a
      sneaxiy 提交于
      60ba559a
    • H
      update (#52875) · ce6978c6
      huangjiyi 提交于
      ce6978c6
    • Z
      54e4360a