1. 19 2月, 2022 1 次提交
    • S
      Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61
      sneaxiy 提交于
      * add DistributedFusedLamb op
      
      * polish code
      
      * fix compile error
      
      * compatible with pten changement
      
      * fix rocm compile error
      
      * improve converage
      
      * update upstream/develop
      
      * fix cast_with_ptr.h
      
      * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1
      
      * fix clip before allreduce
      
      * add use_master_param_norm
      
      * code polish
      
      * fix bug
      
      * fix ROCM ci
      5df3cd61
  2. 15 2月, 2022 2 次提交
    • F
      move algorithm.h (#39502) · 7eb9593e
      Feiyu Chan 提交于
      Move paddle/fluid/operators/math/algorithm.h to paddle/pten/kernels/funcs and rename all references to symbols in it.
      7eb9593e
    • A
      [PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404
      Aurelius84 提交于
      * #1 migrate dist-related type()-> dtype()
      
      * move datatype function from pten -> fluid/framework
      
      * change type() in imperative into convert(dtype())
      
      * modify xx_tensor->type into xx_tensor->dtype
      
      * change the set_type interface and the caller
      
      * modify xx_tensor.type into xx_tensor.dtype
      
      * fix mutable_data(place, dtype())
      
      * change caller of mutable_data in pten and distributed
      
      * change the caller of mutable_data in fluid/framework
      
      * change the caller of mutable_data in imperative directory
      
      * mutable_data: inference
      
      * update the call of mutable_data
      
      * transfer MakePenScalarArray MakePtenScalar ResetHolderWithType
      
      * pass the compile. the next step is remove VarType in Pten
      
      * fix all and remove VarType from pten. success in linux. Next task is other platform
      
      * fix conflict with develop
      
      * fix compiled error
      
      * Fix reset conversion
      
      * fix conflict
      
      * fix compiled problem
      
      * fix typo
      
      * Fix << in tensor_utils.cc
      
      * fix type->dtype
      
      * fix unittest
      
      * fix tensor init constructor
      
      * fix DataTypeSize for BFloat16
      
      * fix code style
      
      * fix npu compiled error
      
      * fix npu
      
      * compile npu sucessfully
      
      * fix conflict
      
      * fix conflict
      Co-authored-by: Nxiongkun <xiongkun03@baidu.com>
      7e7e9404
  3. 11 2月, 2022 1 次提交
  4. 09 2月, 2022 2 次提交
  5. 07 2月, 2022 1 次提交
  6. 27 1月, 2022 1 次提交
  7. 25 1月, 2022 1 次提交
    • W
      [Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338
      Weilong Wu 提交于
      * Added selected_rows and rw_lock to pten
      
      * Renamed the unit test target to fix CI
      
      * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid
      
      * Remove rw_lock.h,rw_lock_test.cc in fluid
      
      * Use pten::RWLock and pten::AutoRDLock, fix CI
      
      * Use pten::SelectedRows
      
      * Use pten::SelectedRows
      
      * Fix to pass NPU CI
      
      * Use pten::SelectedRows, to pass NPU CI
      
      * To fix NPU CI
      
      * To fix NPU CI again
      2bafd338
  8. 24 1月, 2022 2 次提交
    • F
      [Pten] Migration of eigen numeric extensions and functors in paddle/fluid/operatos/eigen (#39124) · a1e40dc6
      Feiyu Chan 提交于
      * migration of functors in paddle/fluid/operators/eigen and paddle/fluid/platform/eigen_ext.h
      * update path of data types like float16.h in includes in extensions.h
      a1e40dc6
    • z8hanghuan's avatar
      support sparse of adam, *test=kunlun (#38483) · e106901e
      z8hanghuan 提交于
      * support sparse of adam, *test=kunlun
      
      * add pre-commit-config.yaml
      
      * support sparse of adam in KL2,*test=kunlun
      
      * support sparse of adam in KL2, *test=kunlun
      
      * modify xpu.cmake, *test=kunlun
      
      * support sparse of adam, rm some wait, *test=kunlun
      
      * support sparse of adam, rm some wait, *test=kunlun
      
      * support sparse of adam, *test=kunlun
      
      * support sparse of adam, *test=kunlun
      
      * support sparse of adam, *test=kunlun
      
      * support sparse of adam, *test=kunlun
      
      * support sparse of adam, *test=kunlun
      e106901e
  9. 21 1月, 2022 1 次提交
  10. 20 1月, 2022 1 次提交
  11. 18 1月, 2022 1 次提交
  12. 17 1月, 2022 1 次提交
  13. 10 1月, 2022 1 次提交
    • Z
      [Unify Tensors PR #5] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea
      Zhanlue Yang 提交于
      * Added shared_ptr<Allocation> member & corresponding interfaces to Storage
      
      * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly
      
      * Fixed issues with storage offset
      
      * Used place to malloc allocation for TensorStorage
      
      * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor
      
      * Fixed issues with place
      
      * Added comments
      
      * Moved mutable_data with stream argument to DenseTensor
      
      * Added set_offset interface
      
      * Fixed CI issues,test=allcases
      
      * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor
      
      * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor
      
      * Modified framework::Tensor to inherit from DenseTensor
      
      * Reverted changes too pten_layout() interface
      
      * Removed friend classes
      
      * Rearranged cfunction calls from tensor.data<void>() to tensor.data()
      
      * Fixed CI issues
      
      * Fixed lite issues
      
      * Fixed data() interface issues,test=allcases
      
      * Resolved IsInitialized() issues
      
      * Fixed ResetHolder() issues
      
      * Fixed MKLDNN & Storage issues
      
      * Resolved ShareBufferWith() issues
      
      * Fixed LoD issues
      5c73a6ea
  14. 07 1月, 2022 1 次提交
  15. 29 12月, 2021 1 次提交
  16. 28 12月, 2021 1 次提交
  17. 24 12月, 2021 1 次提交
  18. 17 12月, 2021 1 次提交
    • S
      Refine some AMP operators for BERT (#37923) · d80fe268
      sneaxiy 提交于
      * support multi precision update for LAMB
      
      * hide some api
      
      * fix ci uts
      
      * fix lamb output of dygraph
      
      * remove some changes to some PR
      
      * try to fix Py3 CI compile error
      
      * fix test_imperative_optimizer, add lars ut, add layer_norm ut
      
      * fix ut, fix format
      
      * fix ut
      
      * fix windows ci
      d80fe268
  19. 03 12月, 2021 1 次提交
  20. 30 11月, 2021 1 次提交
  21. 29 11月, 2021 1 次提交
  22. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  23. 17 11月, 2021 1 次提交
  24. 20 10月, 2021 1 次提交
  25. 19 10月, 2021 1 次提交
  26. 17 10月, 2021 1 次提交
  27. 15 10月, 2021 2 次提交
  28. 14 10月, 2021 3 次提交
  29. 13 10月, 2021 1 次提交
    • L
      Merge lars op (#35476) · 0c31579c
      limingshu 提交于
      * A leap of try for cudaLaunchCooperativeKernel
      
      * fix bugs
      
      * Totally replace the lar cuda kernel
      
      * Fix bugs
      
      * a test for lars merge
      
      * Adding las_op_momentum infer_shape
      
      * Fix codes
      
      * use avg_numel instead of max_numel to acquire grid num
      
      * modify unittest files about lars op
      
      * Finally converge when merged-lars works
      
      * fix ctest files
      
      * add merged_operation kernel when cuda version is older than 11
      
      * Fix code style
      
      * fix ctest failure
      
      * fix error
      
      * fix all ctest error and change lars compute code of cpu
      
      * fix bugs on v100.
      
      * revert python modififation about lars
      
      * revert python modification codes
      0c31579c
  30. 27 9月, 2021 1 次提交
  31. 21 9月, 2021 1 次提交
    • A
      Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861
      Adam Osewski 提交于
      * Create stateful OneDNNAXPYHandler object.
      
      This makes it possible to call it multiple times without recreating the
      oneDNN primitives every time.
      
      * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.
      
      * OneDNN SGD kernel.
      
      * Update call to use new OneDNNAXPYHandler object api.
      
      * Setup seed in proper place.
      
      * Enable OneDNN kernel only for single case.
      
      * For dense param and sparse grad.
      
      * Small refactor.
      
      * Enable oneDNN by op attr or by cmd line flag.
      
      * Use int64_t type for number of elements.
      
      * Support dense param and grad from OneDNN kernel.
      
      * Enable SGD OneDNN kernel when use MP BF16 optimizer.
      
      * Force non-copyable/movable OneDNNAXPYHandler.
      
      * Reuse OneDNNAXPYHandler for spare tensors in SUM op.
      
      * Fix SFINAE rules.
      
      * Remove recording event inside AXPY.
      
      * Get rid of internal primitive caching.
      
      * Stop use PP cache mechanims to store mem and primitive obj.
      * Handler obj store and reuse needed desc & prim
      
      * Do not derive from MKLDNNHandlerT
      799f3861
  32. 14 9月, 2021 1 次提交
  33. 13 9月, 2021 1 次提交
  34. 03 9月, 2021 1 次提交