1. 10 6月, 2023 1 次提交
  2. 08 6月, 2023 1 次提交
  3. 05 6月, 2023 1 次提交
  4. 01 6月, 2023 2 次提交
    • H
      Support static graph code generation for conv2d, conv3d, depthwise_conv2d (#54201) · f3eccb3f
      huangjiyi 提交于
      * update
      
      * update cmake
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Revert "update cmake"
      
      This reverts commit 1e1dc1b2bc9967b725201272607f939260070fd4.
      
      * update
      
      * update
      
      * update
      
      * update
      f3eccb3f
    • T
      mv all unittests test (#53235) · b0e86d55
      tianshuo78520a 提交于
      * mv all unittests test
      
      * fix error
      
      * fix error
      
      * fix
      
      * fix
      
      * del unittests
      
      * fix paddle_build.sh
      
      * fix
      
      * fix test
      
      * fix add test
      
      * fix
      
      * fix
      
      * fix
      
      * merge develop
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * merge develop
      
      * fix test_async_read_write
      
      * fix test_async_read_write
      
      * merge develop
      
      * fix
      
      * fix import legacy_test
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix bug
      
      * fix
      
      * fix coverage test bug
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix code sstyle
      
      * fix code
      
      * fix code
      
      * fix
      
      * fix
      
      * fix
      
      * del test_sequence_enumerate_op.py
      
      * fix
      b0e86d55
  5. 24 5月, 2023 1 次提交
  6. 23 5月, 2023 2 次提交
  7. 19 5月, 2023 1 次提交
    • L
      Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e
      limingshu 提交于
      * Reorganize the forward codes of flash-attention.
      
      * Fix forward.
      
      * Remove some noused codes.
      
      * Simplify codes and fix backward.
      
      * Change all LOG(INFO) to VLOG and fix the backward.
      
      * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes
      
      * decrease the effect of debug print on performance
      
      * Unify the initialize of flashattn arguments.
      
      * Rewirte the reshape of temp_mask and temp_bias.
      
      * API support use_flash_attn.
      
      * Fix compiling error on CI.
      
      * Try to crop the flash-attention lib.
      
      * Correct the condition of whether can use flash-attn.
      
      * Remove the softmax_out argument.
      
      * Remove is_causal.
      
      * Polish codes.
      
      * Fix qkv_transpose_out's shape and scaling of Q * K.
      
      * Update commit of flash-attention.
      
      ---------
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      d29c1f8e
  8. 18 5月, 2023 2 次提交
    • H
      Fused elementwises kernels and ops (#51427) · fb4a6ecf
      Hulek 提交于
      * Fused elementwises kernels and ops
      
      * change fuse pass name
      
      * adjust .pbtxt files
      
      * adjust quantization attributes
      
      * add missing arguments and fix others, review fixed
      
      * simplify fused kernel registration
      
      * fix elementwise unit tests
      
      * reuse one fused elementwise op
      
      * adjust proto
      
      * Add supported datatypes
      
      * Change 'Scale' to 'scale' in tests, change some tests to onednn
      
      * Revert breaking changes
      
      * Fix unit tests
      
      * Delete obsolete test cases
      
      * Delete commented out code
      
      * Fix codestyle
      
      * delete temporary condition
      
      * fix conflicts and delete duplicate fusing
      
      * Fix code after merge
      
      * Move tests to new directory
      
      * fix tests volatility
      
      * Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py
      
      * Update CMakeLists.txt add mkldnn op test
      
      ---------
      Co-authored-by: NSilv3S <slawomir.siwek@intel.com>
      fb4a6ecf
    • H
      move fusion_group kernel to phi (#53781) · 26da689d
      huangjiyi 提交于
      26da689d
  9. 16 5月, 2023 2 次提交
    • G
      remove some [-Wunused-parameter] warning and fix a file to pass cpplint (#53814) · 10a38b4e
      Galaxy1458 提交于
      * test,test=develop
      
      * test,test=develop
      
      * test,test=develop
      
      * test,test=develop
      
      * test,test=develop
      10a38b4e
    • S
      Move fused batchnorm to Phi (#53476) · 5e5481d8
      Sonder 提交于
      * trans fused batch norm Compute function
      
      * trans batch norm register info to phi
      
      * trans fused batch norm grad Compute
      
      * trans batch norm grad register info
      
      * add sig file
      
      * update sig file
      
      * Update fused_bn_activation_kernel.cu
      
      * Update fused_bn_activation_grad_kernel.cu
      
      * fix
      
      * Rename fused_bn_activation_kernel_grad.cu to fused_bn_activation_kernel.cu
      
      * fix
      
      * fix
      
      * fix CudnnDataType error
      
      * fix
      
      * fix include
      
      * update
      
      * add #if
      
      * add fused bn act to cmakelist.txt
      
      * update  cmakelist
      
      * fix #ifdef error
      
      * add timeout set
      
      * add env set
      
      * fix
      
      * fix
      
      * Update fused_bn_activation_sig.cc
      5e5481d8
  10. 15 5月, 2023 1 次提交
  11. 11 5月, 2023 1 次提交
  12. 09 5月, 2023 1 次提交
  13. 05 5月, 2023 1 次提交
  14. 28 4月, 2023 1 次提交
    • B
      Dropout optimize & clean broadcast inT and ElementwiseType (#52969) · d611e48c
      Bo Zhang 提交于
      * change judgement for DropoutGradGPUKernelDriver
      
      * add UnrollerWithoutVecSize and after this Loaddata to be refined
      
      * pass unittest
      
      * use same unroller with XPU
      
      * BroadcastWithInt64Index
      
      * BroadcastDataLoader template partial specialization
      
      * fix compile errs in ROCms
      
      * clean ElementwiseT and InT for BroadcastKernel
      
      * default axis and clean inT
      
      * remove redundant fast divmod computation
      
      * optimize drop_nd & drop_nd_grad
      
      * optimize BroadcastDataLoader bf16 fp16
      
      * rm InT etc. after merge develop
      
      * delete constexpr for windows ci
      
      * fix conflict
      
      * fix conflic with develop
      
      * fix conflic
      
      * new clean
      
      * clean
      d611e48c
  15. 27 4月, 2023 2 次提交
    • S
      Move fused feedforward (#53166) · 25b4ba7f
      Sonder 提交于
      * trans fused_feedward Compute function to phi
      
      * add register info
      
      * remove maxfunctor
      
      * move fused feedward to phi
      
      * remove sig file
      
      * remove fliud include
      
      * add include
      
      * add include
      
      * add sig file
      
      * add output register info
      
      * fix sig file
      
      * Update fused_feedforward_sig.cc
      
      * fix grad kernel
      
      * update output register info
      
      * fix
      
      * open fused_feedforward static build
      
      * add optional and fix code style
      
      * fix output info for fused attention
      
      * add optional param
      
      * merge
      25b4ba7f
    • H
      Register fluid xpu kerenls to phi [part 2] (#53188) · eee9c788
      huangjiyi 提交于
      * update
      
      * fix bug
      eee9c788
  16. 26 4月, 2023 1 次提交
  17. 25 4月, 2023 1 次提交
    • Y
      [PHI]Add flags macro for PHI (#52991) · 22e96bde
      YuanRisheng 提交于
      * add flags for phi
      
      * fix compile bugs
      
      * fix ci bugs
      
      * fix inference bugs
      
      * fix cinn' bugs
      
      * fix cinn bugs
      
      * perfect code according comment
      
      * fix ci bugs
      
      * fix ci bugs
      22e96bde
  18. 24 4月, 2023 1 次提交
    • S
      Move fused feedforward xpu (#53196) · 83c2e682
      Sonder 提交于
      * add sig file
      
      * trans fused feedforward compute function to phi
      
      * remove fluid include
      
      * delete old register info
      
      * fix build error
      
      * trans fused feedforward grad xpu to phi
      83c2e682
  19. 19 4月, 2023 4 次提交
  20. 18 4月, 2023 1 次提交
  21. 17 4月, 2023 1 次提交
  22. 14 4月, 2023 1 次提交
    • S
      Move fused_attention op to phi [迁移反向 GPU OpKernel] (#51909) · 3bac6264
      Sonder 提交于
      * add kernel functions
      
      * update kernel functions
      
      * update func parameters' name
      
      * create codes for gpu device
      
      * 调整文件位置
      
      * fix include error
      
      * remove dependent files to phi/
      
      * restore fused_attention_op.cu
      
      * fix dependence errors
      
      * fix dependence errors
      
      * fix include error
      
      * fix all depandence errors[build success]
      
      * remove useless include
      
      * recover useless include
      
      * use phi::ToNCCLDataType
      
      * fix namespace
      
      * update new register code
      
      * fix error in fused_gemm_epilogue_utils
      
      * fix error in FusedAttentionKernel parm
      
      * finish fused_attention registe code[build success]
      
      * add paddle::optional
      
      * add sig file
      
      * fix build error
      
      * fix a include error
      
      * 恢复正向代码
      
      * update CMkaeList
      
      * trans Compute function to phi [build success]
      
      * add register code and fix include error [build success]
      
      * fix parameter sequence
      
      * add include file
      
      * update #if before include
      
      * update #if before include
      
      * fix grammly error
      
      * update codes for DropoutParam
      
      * remove const cast
      
      * trans some fluid api to phi api
      
      * remove const cast
      
      * trans some fluid api to phi api
      
      * add #if
      
      * update test code
      
      * update test codes
      
      * recover test codes
      
      * fix namespace and remove fluid include
      
      * recover random seed
      
      * remove fluid quant_helper
      
      * fix include error
      
      * include utils in funcs
      
      * change include file
      
      * move grad codes back to fluid floder
      
      * move grad codes back to fluid floder
      
      * fix sig file error
      
      * update include
      
      * recover codes to develop
      
      * update register codes
      
      * fix build error
      
      * recover fluid include
      
      * remove some fluid include
      
      * remove some fluid include
      
      * Update fused_attention_op.cu
      
      * remove fluid include
      
      * add some fluid include
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * Update fused_attention_op.cu
      
      * remote useless include
      3bac6264
  23. 10 4月, 2023 1 次提交
  24. 06 4月, 2023 1 次提交
    • S
      Move fused_attention op to phi [迁移前向 GPU OpKernel] (#51743) · a7ec8958
      Sonder 提交于
      * add kernel functions
      
      * update kernel functions
      
      * update func parameters' name
      
      * create codes for gpu device
      
      * 调整文件位置
      
      * fix include error
      
      * remove dependent files to phi/
      
      * restore fused_attention_op.cu
      
      * fix dependence errors
      
      * fix dependence errors
      
      * fix include error
      
      * fix all depandence errors[build success]
      
      * remove useless include
      
      * recover useless include
      
      * use phi::ToNCCLDataType
      
      * fix namespace
      
      * update new register code
      
      * fix error in fused_gemm_epilogue_utils
      
      * fix error in FusedAttentionKernel parm
      
      * finish fused_attention registe code[build success]
      
      * add paddle::optional
      
      * add sig file
      
      * fix build error
      
      * fix a include error
      
      * update CMkaeList
      
      * fix parameter sequence
      
      * add include file
      
      * update #if before include
      
      * fix grammly error
      
      * update codes for DropoutParam
      
      * remove const cast
      
      * trans some fluid api to phi api
      
      * add #if
      
      * update test code
      
      * update test codes
      
      * recover test codes
      
      * trans fused_attention to fluid
      
      * move #endif to end
      
      * move #endif
      
      * delete useless files
      
      * use fused attention utils and recover random seed
      
      * remove fluid include in phi
      a7ec8958
  25. 04 4月, 2023 1 次提交
  26. 30 3月, 2023 1 次提交
    • P
      Speedup worker (#51760) · 8ca86d72
      pangengzheng 提交于
      * support run haokanctr model in heterps-models
      
      * polish setup.py
      
      * polish JVM_LIB in evn_dict
      
      * align infer auc with DistPsArch pre-stable
      
      * async and multi thread data feed
      
      * rewrite dense tensor intialization
      
      * async infer shape and reuse memory
      8ca86d72
  27. 27 3月, 2023 1 次提交
    • S
      Fused elementwise_(mul/div) (#50428) · 968f7f24
      Sławomir Siwek 提交于
      * extract Op and OPMaker to .h
      
      * extend pattern for fused_op
      
      * set "with_residual" default to false
      
      * adjust fuse passes
      
      * remove fc+eltwise flag
      
      * fused_output_scale
      
      * activation attrs
      
      * remove extra attrs
      
      * fix int8/bf16 unit tests
      
      * simplify RecomputeOutputDims
      
      * remove unused method
      
      * Add description for attributes
      
      * add extra check
      
      * adjust op compats
      
      * update quantize test
      
      * fix protobuf parsing error
      
      * fix int8 performance
      
      * fused elementwises
      
      * merge develop
      
      * remove activation
      
      * restore activation for existing add/sub ops
      968f7f24
  28. 23 3月, 2023 1 次提交
  29. 22 3月, 2023 3 次提交
    • S
      Extract fused_transpose op dedicated for oneDNN fuse passes (#50021) · 02296977
      Sławomir Siwek 提交于
      * extract common methods to reuse
      
      * add header for transpose ops
      
      * fused_transpose
      
      * Split big function
      
      * transpose2 tests
      
      * fused_transpose
      
      * Apply extra attributes
      
      * add pbtxt file
      
      * update pbtxt
      
      * Merge develop
      
      * add more strict op compats
      
      * code  style
      
      * remove mkldnn_data_type
      
      * unify SetOutMemDescWithReshape2FuseSupport
      
      * adjust quantize-dequantize for transpose
      
      * remove appendact
      
      * transpose2 quantization
      
      * fix int8 tests
      
      * adjust transpose_op to current develop
      
      * delete fusion code from transpose_kernel
      
      * add fused transpose to NHWC unittest
      
      * change order
      02296977
    • S
      Add fused_linear_param_grad_add_kernel (#51805) · f59c5d8b
      sneaxiy 提交于
      * add fused_linear_param_grad_add_kernel
      
      * fix compile error
      
      * remove flag
      
      * fix ci compile error
      
      * fix ci compile error
      
      * revert pylayer revision
      
      * fix ci ut
      
      * improve performance
      f59c5d8b
    • R
      Fix conflict of CppTypeToDataType (#51919) · 535ddd3d
      Ruibiao Chen 提交于
      535ddd3d
  30. 21 3月, 2023 1 次提交
    • iSerendipity's avatar
      [PHI decoupling] Move DataType* from paddle:experimental to phi namespace (#51716) · 4638a62e
      iSerendipity 提交于
      * move DataType from paddle::experimental to phi
      
      * convert namespace
      
      * convert namespace
      
      * convert namespace
      
      * clarify namespace
      
      * convert more datatype
      
      * Revert "convert more datatype"
      
      This reverts commit 083b462959e6a22d4d8767707b628b95b396642e.
      
      * convert more in auto_code_generator
      
      * fix conflicts for XPU
      
      * fix namespace conflicts
      
      * fix errors
      
      * Revert "fix errors"
      
      This reverts commit f9d9958b54ee32141112274c8a5c3c381ab0f876.
      
      * fix errors
      
      * fix formatting
      4638a62e