1. 02 6月, 2022 23 次提交
    • T
      Enable fc on bfloat16 (#43154) · cb1a0ec1
      Tomasz Socha 提交于
      * Enable fc on bfloat16
      
      * Add pass for residual connection
      
      * Dissable Residual connection pass for now
      
      * Ban ResidualData from DQ
      
      * style
      
      * WO for python tests
      cb1a0ec1
    • L
      Add generate_proposals_v2 op and expend function of gather op for kunlun. *test=kunlun (#43162) · ff22a9c4
      Leo Guo 提交于
      * Add generate_proposals_v2 op and unittest for kunlun. *test=kunlun
      
      * Add the assign op to xpu2_op_list and expand the function of gather op. Add the unit-test of generate_proposals_v2. *test=kunlun
      ff22a9c4
    • W
      [Eager] FLAGS_retain_grad_for_all_tensor set false in default (#43142) · 4d3b7d7d
      Weilong Wu 提交于
      * [Eager] FLAGS_retain_grad set false
      
      * Add FLAGS_retain_grad_ for some tests
      
      * Add FLAGS_retain_grad_ to some tests
      
      * modified set_flags
      
      * modified set_flags
      
      * fix windows-ci and windows-openblas-ci
      
      * import paddle.fluid
      4d3b7d7d
    • W
      [Eager] clear_gradient use set_constant but not zeros_like (#43171) · 1e0ea6a4
      wanghuancoder 提交于
      * clear_gradient use set_constant but not zeros_like
      1e0ea6a4
    • S
      Fix bug of CUDAGraph kernel parameter comparation (#43163) · 3fcfcd51
      sneaxiy 提交于
      * fix cuda graph sizeof
      
      * fix tuple type
      3fcfcd51
    • H
      support eager dygraph in moe_layer (#43168) · 5ccc49e7
      Haohongxiang 提交于
      5ccc49e7
    • L
      Update CI reviewer for distributed API docs (#43166) · 0fbf815c
      Ligoml 提交于
      <!-- Demo: https://github.com/PaddlePaddle/Paddle/pull/24810 -->
      ### PR types
      <!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] -->
      Others
      ### PR changes
      <!-- One of [ OPs | APIs | Docs | Others ] -->
      Others
      ### Describe
      <!-- Describe what this PR does -->
      update ci reviewer list of api docs
      0fbf815c
    • F
      [XPUPS] modify BKCL comm op register (#43028) · 1bfbcfaf
      Fan Zhang 提交于
      * Adapt XPUPS - 1st version - 3.24
      
      * Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24
      
      * Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25
      
      * refactor heter comm kernel
      
      * update. test=develop
      
      * Adapt XPUPS - modify by compilation - 4th version - 3.27
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * heter_comm update
      
      * heter_comm update
      
      * update calc_shard_offset. test=develop
      
      * heter_comm update
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30
      
      * update. test=develop
      
      * update pslib.cmake
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 6th version - 3.30
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * used by minxu
      
      * update heter_comm_inl
      
      * fix. test=develop
      
      * Adapt XPUPS - modify by kp compilation  - 7th version - 3.30
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 3.31 update
      
      * Adapt XPUPS - update kp compilation path  - 8th version - 3.31
      
      * add optimizer kernel. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm_kernel.kps 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update heter_comm.h 3.31
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 9th version - 4.1
      
      * update hashtable. test=develop
      
      * fix. test=develop
      
      * update hashtable 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 10th version - 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.1 19:30
      
      * fix. test=develop
      
      * update ps_gpu_wrapper.kps 4.1
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 11th version - 4.1
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 12nd version - 4.2
      
      * fix. test=develop
      
      * fix. test=develop
      
      * modify by compilation 4.2
      
      * 4.2 update
      
      * fix. test=develop
      
      * template init. test=develop
      
      * update 4.6
      
      * fix. test=develop
      
      * template init. test=develop
      
      * 4.6 modify by compilation
      
      * hashtable template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 13nd version - 4.7
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.11 update
      
      * update by pre-commit
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * 4.12 update
      
      * fix. test=develop
      
      * Adapt XPUPS - update by kp compilation  - 14th version - 4.13
      
      * 4.13 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 update
      
      * 4.14 modify by merged latest compilation
      
      * retry CI 4.14
      
      * 4.15 pass static check
      
      * 4.15 modify by gpups CI
      
      * 3.16 update by gpups CI - modify ps_gpu_wrapper.h
      
      * 4.16 update
      
      * 4.16 pass xpu compile
      
      * 4.16 retry CI
      
      * 4.16 update
      
      * Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24
      
      * update by compilation
      
      * Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25
      
      * update device_worker_factory
      
      * Adapt XPUPS - split heter_ps into .cu and .cc - 4.27
      
      * Adapt XPUPS - register pull_box_sparse op under XPU_KP - 4.28
      
      * update
      
      * 5.7 modify ps_gpu_wrapper pull_sparse
      
      * 5.11 update ps_gpu_wrapper CopyKeysKernel
      
      * 5.13 modify calc_shard_offset_kernel & fill_shard_key_kernel
      
      * modify fill_dvals_kernel & PullCopy & c_sync_calc_stream - 5.18
      
      * modify PushCopy & fill_shard_grads_kernel & register push_box_sparse - 5.19
      
      * Adapt XPUPS - modify BKCL comm op register - 5.26
      
      * Adapt XPUPS - modify BKCL comm op register - 5.27
      
      * Adapt XPUPS - modify BKCL comm op register - 5.27v2
      
      * Adapt XPUPS - modify BKCL comm op register - 5.27v3
      
      * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init - 5.30
      
      * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v2 - 5.30
      
      * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v3 - 5.30
      
      * Adapt XPUPS - modify c_comm_init_all_op to adapt BKCL init v4 - 5.31
      Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
      1bfbcfaf
    • T
      Fix for Bfloat16 placement pass. (#43109) · 030b23da
      Tomasz Socha 提交于
      * Fix bfloat16 placement pass
      
      * Make it nicer
      
      * Fix leftovers
      
      * Style
      030b23da
    • Z
      Support head_dim = 96 in fused_multi_transformer for PLATO-XL (#43120) · 990c5e7f
      Zhang Zheng 提交于
      * Support head_dim = 96 in fused_multi_transformer in PLATO-XL
      
      * add notes
      990c5e7f
    • 光明和真理's avatar
      041000c2
    • C
      add concat_grad mlu kernel. (#43117) · fe911a51
      Chenxiao Niu 提交于
      fe911a51
    • Z
      add federated learning parameter server(fl-ps) mode (#42682) · d999049f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      d999049f
    • W
      [Paddle-Inference] new general transformer inference support (#43077) · 2810dfea
      Wangzheee 提交于
      * new general transformer inference support
      2810dfea
    • Z
      Delete inplace strategy in group_norm_fwd (#43137) · 0cb9dae5
      Zhang Zheng 提交于
      * Delete inplace strategy in group_norm_fwd
      
      * fix
      0cb9dae5
    • W
      [Eager] first run accumulation node (#43134) · 0f1be6e0
      wanghuancoder 提交于
      * first run accumulation node
      0f1be6e0
    • S
      Support hetergraph reindex (#43128) · ceb20406
      Siming Dai 提交于
      * support heter reindex
      
      * add unittest, fix bug
      
      * add comment
      
      * delete empty line
      
      * refine example
      
      * fix codestyle
      
      * add disable static
      ceb20406
    • J
      [Dataloader]Add prefetch_factor in dataloader (#43081) · 2bfe8b2c
      Jackwaterveg 提交于
      * fix usage of prefetch_factor
      
      * add assert
      
      * add docstring and change prefetch_factor when num_workers=0
      
      * fix doc
      2bfe8b2c
    • G
      67163fb4
    • L
      Extend forward fast layer_norm kernel to support more dimensions. (#43118) · 85baa3c0
      Li Min 提交于
      * extend forward fast_ln_kernel to support more column values.
      85baa3c0
    • Z
      [AutoParallel] engine.prepare only once (#43093) · 8c7cb3d6
      zhaoyingli 提交于
      * prepare only once
      8c7cb3d6
    • Z
      bug fix (#43153) · 7ba843e6
      zhaoyingli 提交于
      7ba843e6
    • S
      Support CUDA Graph for partial graph in dygraph mode (#42786) · d05b940a
      sneaxiy 提交于
      * support CUDAGraph for partial graph
      
      * add ut
      
      * fix ci
      
      * fix ut again because of eager mode
      
      * fix kunlun ci
      
      * fix win ci
      d05b940a
  2. 01 6月, 2022 17 次提交