1. 26 4月, 2022 3 次提交
    • F
      Adapt BKCL comm for XPUPS (#42168) · fccb0819
      Fan Zhang 提交于
      * Adapt XPUPS - 1st version - 3.24
      
      * Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24
      
      * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25
      
      * refactor heter comm kernel
      
      * update. test=develop
      
      * Adapt XPUPS - modify by compilation - 4th version - 3.27
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * heter_comm update
      
      * heter_comm update
      
      * update calc_shard_offset. test=develop
      
      * heter_comm update
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30
      
      * update. test=develop
      
      * update pslib.cmake
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 6th version - 3.30
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * used by minxu
      
      * update heter_comm_inl
      
      * fix. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 7th version - 3.30
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 3.31 update
      
      * Adapt XPUPS - update kp compilation path  - 8th version - 3.31
      
      * add optimizer kernel. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm.h 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 9th version - 4.1
      
      * update hashtable. test=develop
      
      * fix. test=develop
      
      * update hashtable 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 10th version - 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1 19:30
      
      * fix. test=develop
      
      * update ps_gpu_wrapper.kps 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 11th version - 4.1
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 12nd version - 4.2
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.2
      
      * 4.2 update
      
      * fix. test=develop
      
      * template init. test=develop
      
      * update 4.6
      
      * fix. test=develop
      
      * template init. test=develop
      
      * 4.6 modify by compilation
      
      * hashtable template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 13nd version - 4.7
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * update by pre-commit
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.12 update
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 14th version - 4.13
      
      * 4.13 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 modify by merged latest compilation
      
      * retry CI 4.14
      
      * 4.15 pass static check
      
      * 4.15 modify by gpups CI
      
      * 3.16 update by gpups CI - modify ps_gpu_wrapper.h
      
      * 4.16 update
      
      * 4.16 pass xpu compile
      
      * 4.16 retry CI
      
      * 4.16 update
      
      * Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24
      
      * update by compilation
      
      * Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25
      
      * update device_worker_factory
      Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
      fccb0819
    • Z
      Optimize the performanece of sum api (#42231) · 2fe4bf2f
      zyfncg 提交于
      * optimize the performanece of sum api
      
      * optimize IsDenseTensorInput
      
      * remove debug log
      2fe4bf2f
    • X
      Add C++ EinsumOp which support 2 operands einsum. (#42105) · c7302f96
      xiongkun 提交于
      * full api fix
      
      * when out is None, go old dygraph mode
      
      * by static check
      
      * first version: support 2-inputs forwards. TODO: 1. backward  2. BroadCast  3. MultiVariable
      
      * time out -> 120
      c7302f96
  2. 25 4月, 2022 2 次提交
    • B
      【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743) · bbaaf217
      BrilliantYuKaimin 提交于
      * Add infermeta for ChannelShuffle
      
      * Create channel_shuffle_grad_kernel.h
      
      * Create channel_shuffle_kernel.h
      
      * Create channel_shuffle_sig.cc
      
      * Create channel_shuffle_op.cc
      
      ChannelShuffle算子的描述
      
      * Create channel_shuffle_kernel_impl.h
      
      ChannelShuffle核函数的实现
      
      * Create channel_shuffle_grad_kernel_impl.h
      
      ChannelShuffle反向核函数的实现
      
      * Add kernel register of channel shuffle and grad
      
      注册ChannelShuffle及其反向的核函数
      
      * add nn.functional.channel_shuffle
      
      * add nn.ChannelShuffle
      
      * Create test_channel_shuffle.py
      
      * Update example of ChannelShuffle in vision.py
      
      * Update test_channel_shuffle.py
      
      * 修改channel_shuffle核函数的实现位置
      
      * 修正代码格式
      
      * 删除多余空格
      
      * 完善channel_shuffle的错误检查
      
      * Update unary.cc
      
      * Update channel_shuffle_op.cc
      
      * Update test_channel_shuffle.py
      
      * Update unary.cc
      
      * add channel_shuffle
      
      * Update test_channel_shuffle.py
      
      * Update vision.py
      
      * 调整代码格式
      
      * Update channel_shuffle_sig.cc
      
      * 更新ChannelShuffle的文档
      
      * 更新channel_shuffle的文档
      
      * remove ChannelShuffleOpArgumentMapping
      
      * add ChannelShuffleGradInferMeta
      
      * Update channel_shuffle_op.cc
      
      * 调整channel_shuffle及其梯度的核函数的位置
      bbaaf217
    • C
      Optimize dygraph GetExpectedKernelType perf (#42154) · 3a0d7bf0
      Chen Weihang 提交于
      * opt dygraph scheduling
      
      * revert part impl
      3a0d7bf0
  3. 23 4月, 2022 1 次提交
  4. 22 4月, 2022 1 次提交
    • M
      [WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72
      Ming-Xu Huang 提交于
      * Fix leading dimension setting error in fused_gemm_epilogue_grad_op.
      
      * Add dyload to cuBlasLt functions.
      
      * Added cublasLtMatmulAlgoGetHeuristic to improve performance.
      
      * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue
      
      * Added UTs to FLAGS_cublaslt_exhaustive_search_times
      
      * Added warmup runs in algo searching of Gemm epilogue.
      
      * Update copyright and documents.
      
      * Fixed error handling.
      19650d72
  5. 21 4月, 2022 3 次提交
  6. 20 4月, 2022 3 次提交
    • B
      【PaddlePaddle Hackathon 2】9、为 Paddle 新增 logspace API (#41261) · a3c50c42
      BrilliantYuKaimin 提交于
      * 增加logspace的算子描述
      
      * 增加logspace的形状推断
      
      * 增加logspace核函数实现
      
      * 在python中增加logspace接口
      
      * 增加logspace单测
      
      * 增加logspace
      
      * Update logspace_kernel.cu
      
      * Update logspace_op.cc
      
      * 调整代码格式
      
      * Update doc of logspace
      
      * Update tensor.py
      
      * Update logspace_op.cc
      
      * Update logspace_kernel.cc
      
      * Update logspace_kernel.cu
      
      * Update test_logspace.py
      
      * 调整 logspace 的位置
      
      * 调整代码格式
      a3c50c42
    • F
      [MLU] add gather mlu kernel (#41969) · 23ad2166
      fwenguang 提交于
      23ad2166
    • T
      enable auto-tune when using cinn (#41795) · d70104e5
      TeFeng Chen 提交于
      * optimize preparation overhead before executing cinn compiled program
      
      * update code notes
      
      * fix flag annotation
      
      * enable auto-tune when using CINN
      
      * update cinn commit tag
      
      * skip test
      
      * fix lacking header file
      d70104e5
  7. 19 4月, 2022 8 次提交
  8. 18 4月, 2022 5 次提交
  9. 17 4月, 2022 2 次提交
    • F
      XPUPS Adaptation (#40991) · 0ef3ef28
      Fan Zhang 提交于
      * Adapt XPUPS - 1st version - 3.24
      
      * Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24
      
      * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25
      
      * refactor heter comm kernel
      
      * update. test=develop
      
      * Adapt XPUPS - modify by compilation - 4th version - 3.27
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * heter_comm update
      
      * heter_comm update
      
      * update calc_shard_offset. test=develop
      
      * heter_comm update
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30
      
      * update. test=develop
      
      * update pslib.cmake
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 6th version - 3.30
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * used by minxu
      
      * update heter_comm_inl
      
      * fix. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 7th version - 3.30
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 3.31 update
      
      * Adapt XPUPS - update kp compilation path  - 8th version - 3.31
      
      * add optimizer kernel. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm.h 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 9th version - 4.1
      
      * update hashtable. test=develop
      
      * fix. test=develop
      
      * update hashtable 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 10th version - 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1 19:30
      
      * fix. test=develop
      
      * update ps_gpu_wrapper.kps 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 11th version - 4.1
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 12nd version - 4.2
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.2
      
      * 4.2 update
      
      * fix. test=develop
      
      * template init. test=develop
      
      * update 4.6
      
      * fix. test=develop
      
      * template init. test=develop
      
      * 4.6 modify by compilation
      
      * hashtable template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 13nd version - 4.7
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * update by pre-commit
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.12 update
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 14th version - 4.13
      
      * 4.13 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 modify by merged latest compilation
      
      * retry CI 4.14
      
      * 4.15 pass static check
      
      * 4.15 modify by gpups CI
      
      * 3.16 update by gpups CI - modify ps_gpu_wrapper.h
      
      * 4.16 update
      
      * 4.16 pass xpu compile
      
      * 4.16 retry CI
      
      * 4.16 update
      Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
      0ef3ef28
    • C
      [Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96
      Chen Weihang 提交于
      * split phi and fluid infermeta context
      
      * resolve conflict
      
      * fix type error
      
      * optimize scheduling perf
      
      * spec small vector size
      
      * replace all grad var name
      
      * fix test failed
      
      * move init defalut signature
      
      * polish details
      
      * polish details
      
      * fix no init bug
      
      * init sig for tests
      
      * add init sig for infer
      
      * fix infrt error
      
      * fix infrt failed
      
      * fix kunlun error
      
      * fix infrt failed
      7ee31a96
  10. 16 4月, 2022 1 次提交
  11. 15 4月, 2022 8 次提交
  12. 14 4月, 2022 3 次提交
    • L
      fbe2c311
    • Y
      Optimize the finding of max workspace size. (#41741) · 3ce879db
      Yiqun Liu 提交于
      3ce879db
    • S
      FC+elementwise_add (residual connection) (#41776) · 92d8d0bc
      Sławomir Siwek 提交于
      * Change tensor name to match activation
      
      * declare fc_eltwise_add pass
      
      * merge conv_eltwise refactor PR
      
      * first compilable draft
      
      * unittest feedback tools
      
      * Fuse pass tester
      
      * Move IsReachable() to shared file
      
      * 100% coverage of fuse_pass_tester.cc
      
      * register pass
      
      * Add bias node
      
      * Improve unit tests / remove bias node from pattern
      
      * improve fc_eltwiseadd_unittest
      
      * cancel eltwise_add fuse if act is already fused
      
      * Add elementwise_input scale
      
      * Residual MVP
      
      * Add new FC attrs
      
      * Add more test cases
      
      * Add missing op attrs
      
      * Adapt code to new Elementwise pattern
      
      * reuse existing fcpattern
      
      * improve code style
      
      * remove unused arguments
      
      * fix typo
      
      * remove whitespace
      
      * remove int8 related code
      
      * Remove attributes from base ops
      
      * style
      
      * style check
      
      * Remove input from base op
      
      * Set attribute during fuse
      
      * ut timeout
      
      * download and test model
      
      * DRY
      
      * apply feedback from review
      
      * Style check
      
      * fix typo
      
      * cosmetic changes
      
      * explicitly set residual as output
      
      * VIT-OCR accuracy check
      
      * trigger CI
      
      * remove whitespaces
      
      * fix missing data file
      92d8d0bc