- 20 1月, 2022 1 次提交
-
-
由 zhangbo9674 提交于
* fix mp * support merged_momentum for mp
-
- 18 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Patched python level LoDTensor * Merge Tensor into DenseTensor * Fixed namespace issues,test=allcases * Fixed merge issues * Fixed inference issues * Fixed NPU test issues * Fixed merge issues
-
- 17 1月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 10 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues
-
- 07 1月, 2022 1 次提交
-
-
由 zhangbo9674 提交于
* add multi tensor for adam * add merged_adam op * refine code * refine adam compute logic
-
- 29 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
-
- 28 12月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 24 12月, 2021 1 次提交
-
-
由 zhangbo9674 提交于
-
- 17 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 30 11月, 2021 1 次提交
-
-
由 zhangbo9674 提交于
* add regularation and Nesterov for mergerd_momentum * refine unittest for use_nesterov attr * refine op check * refine code * fix bug * refine code of regularization_flag * delete useless code
-
- 29 11月, 2021 1 次提交
-
-
由 piotrekobiIntel 提交于
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 17 11月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* copy beta pow to same place when skip_update=1 * fix xpu
-
- 20 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 19 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add pow2_warmup op * remove contrib __all__ * add AttrT * rename * follow comments * fix duplicate PADDLE_RESTRICT
-
- 17 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 15 10月, 2021 2 次提交
-
-
由 Zeng Jinle 提交于
* remove wrong restrict * remove master_param_out __restrict__ * update
-
由 Zeng Jinle 提交于
-
- 14 10月, 2021 3 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
* merge momentum ops * update * add ut to improve coverage * remove optimizer change * fix error msg * update ut * add __restrict__ for CUDA * update ut * move merged_momentum_op to optimizer dir * fix coverage
-
- 13 10月, 2021 1 次提交
-
-
由 limingshu 提交于
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * a test for lars merge * Adding las_op_momentum infer_shape * Fix codes * use avg_numel instead of max_numel to acquire grid num * modify unittest files about lars op * Finally converge when merged-lars works * fix ctest files * add merged_operation kernel when cuda version is older than 11 * Fix code style * fix ctest failure * fix error * fix all ctest error and change lars compute code of cpu * fix bugs on v100. * revert python modififation about lars * revert python modification codes
-
- 27 9月, 2021 1 次提交
-
-
由 limingshu 提交于
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation.
-
- 21 9月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Create stateful OneDNNAXPYHandler object. This makes it possible to call it multiple times without recreating the oneDNN primitives every time. * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel. * OneDNN SGD kernel. * Update call to use new OneDNNAXPYHandler object api. * Setup seed in proper place. * Enable OneDNN kernel only for single case. * For dense param and sparse grad. * Small refactor. * Enable oneDNN by op attr or by cmd line flag. * Use int64_t type for number of elements. * Support dense param and grad from OneDNN kernel. * Enable SGD OneDNN kernel when use MP BF16 optimizer. * Force non-copyable/movable OneDNNAXPYHandler. * Reuse OneDNNAXPYHandler for spare tensors in SUM op. * Fix SFINAE rules. * Remove recording event inside AXPY. * Get rid of internal primitive caching. * Stop use PP cache mechanims to store mem and primitive obj. * Handler obj store and reuse needed desc & prim * Do not derive from MKLDNNHandlerT
-
- 14 9月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* add layerwise learning rate for adamw * fix format * add unitest * add NotImplementedError * add gpu unitest * update gpuplace
-
- 13 9月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 03 9月, 2021 1 次提交
-
-
由 TTerror 提交于
-
- 27 8月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
-
- 25 8月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
-
- 23 8月, 2021 1 次提交
-
-
由 zhaoyingli 提交于
* adamw support cuda * adamw support cuda
-
- 18 8月, 2021 1 次提交
-
-
由 lzzyzlbb 提交于
* [npu]add rmsprop op
-
- 17 8月, 2021 1 次提交
-
-
由 Roc 提交于
-
- 11 8月, 2021 1 次提交
-
-
由 ronnywang 提交于
* add momentum_op_npu and test * update * fix hang
-
- 22 7月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* copy found_inf to cpu in advance to improve performance * add npu test * add npu test * refine code * refine memcpy op * fix adam
-
- 14 7月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* adam add input SkipUpdate * add unittest * add npu unittest * fix xpu compile * remove param stream
-
- 21 6月, 2021 1 次提交
-
-
由 lidanqing 提交于
* Add oneDNN AXPY handler. * Add fallback for small tensors. * Fix ifdefs * Remove unnecessary namespace prefixes and add missing headers. * Guard handler_axpy with proper ifdefs. * Compilation of this function is possible only when Paddle is not build with CUDA nor HIP. * Move AXPY handler code to separate files. * Use oneDNN AXPY handler in SGD op. * Use axpy handler only when Paddle is built with oneDNN. * Add test for SUM BF16 with big rows. * Fix SFINAE rules for elementwise_add_to. * Add test case for SGD with big rows. * update * update Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
-
- 03 6月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 26 5月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refine ~npuOpRunner * implement destructor and forbid copy * use reference to avoid copy * use const reference * relax adam precision * fix top_k
-
- 13 5月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add use_global_beta_pow * add use_global_beta_pow * update npu kernel * update python api * refine code * add ut for use_global_beta_pow * fix npu kernel * add ut for api * add ut for exception * add ut for save/load
-