提交 · df5152551d933487c7e9f0edd47c7066f2c95f86 · PaddlePaddle / Paddle

20 1月, 2022 1 次提交
- Z
  Fix master weight bug for multi_tensor optimizer(momentum, adam) (#38991) · 6b0c57cf
  由 zhangbo9674 提交于 1月 20, 2022
```
* fix mp

* support merged_momentum for mp
```
  6b0c57cf
18 1月, 2022 1 次提交

[Unify Tensors PR ] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

17 1月, 2022 1 次提交
- S
  
  add squared_l2_norm (#38968) · 6eeb16b8
  由 sneaxiy 提交于 1月 17, 2022
  
  6eeb16b8
10 1月, 2022 1 次提交

[Unify Tensors PR ] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

07 1月, 2022 1 次提交

Add multi tensor for adam (#38010) · fb3313e9

由 zhangbo9674 提交于 1月 07, 2022

* add multi tensor for adam

* add merged_adam op

* refine code

* refine adam compute logic

fb3313e9

29 12月, 2021 1 次提交
- S
  
  fix lamb beta1pow beta2pow update (#38518) · 3672480b
  由 sneaxiy 提交于 12月 29, 2021
  
  3672480b
28 12月, 2021 1 次提交
- G
  
  fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
  由 Guoxia Wang 提交于 12月 28, 2021
  
  6f1bb3d6
24 12月, 2021 1 次提交
- Z
  
  [AMP] Add multi_precision for sgd (#38231) · a4d07bb9
  由 zhangbo9674 提交于 12月 24, 2021
  
  a4d07bb9
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
30 11月, 2021 1 次提交

[opt] Add regularation and Nesterov for mergerd_momentum op (#37527) · c8ffdecb

由 zhangbo9674 提交于 11月 30, 2021

* add regularation and Nesterov for mergerd_momentum

* refine unittest for use_nesterov attr

* refine op check

* refine code

* fix bug

* refine code of regularization_flag

* delete useless code

c8ffdecb

29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

17 11月, 2021 1 次提交
- L
  copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
  由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
  5e4b419b
20 10月, 2021 1 次提交
- Z
  
  fix pow2 decay (#36559) · 605e7f08
  由 Zeng Jinle 提交于 10月 20, 2021
  
  605e7f08
19 10月, 2021 1 次提交

Add pow2_decay_with_linear_warmup op (#36421) · 305b99a0

由 Zeng Jinle 提交于 10月 19, 2021

* add pow2_warmup op

* remove contrib __all__

* add AttrT

* rename

* follow comments

* fix duplicate PADDLE_RESTRICT

305b99a0

17 10月, 2021 1 次提交
- Z
  
  refine rescale_grad (#36490) · 4e036fa1
  由 Zeng Jinle 提交于 10月 17, 2021
  
  4e036fa1
15 10月, 2021 2 次提交
- Z
  Remove wrong __restrict__ of CUDA LarsMomentumOpKernel (#36460) · adb80494
  由 Zeng Jinle 提交于 10月 15, 2021
```
* remove wrong restrict

* remove master_param_out __restrict__

* update
```
  adb80494
- Z
  
  fix momentum ops (#36452) · 4dda18a8
  由 Zeng Jinle 提交于 10月 15, 2021
  
  4dda18a8
14 10月, 2021 3 次提交
- Z
  
  fix lars (#36431) · 8256f6fa
  由 Zeng Jinle 提交于 10月 14, 2021
  
  8256f6fa
- Z
  
  refine merge lars (#36428) · 63fd7d66
  由 Zeng Jinle 提交于 10月 14, 2021
  
  63fd7d66
- Z
  Merge momentum ops/kernels (#36380) · f4eda869
  由 Zeng Jinle 提交于 10月 14, 2021
```
* merge momentum ops

* update

* add ut to improve coverage

* remove optimizer change

* fix error msg

* update ut

* add __restrict__ for CUDA

* update ut

* move merged_momentum_op to optimizer dir

* fix coverage
```
  f4eda869
13 10月, 2021 1 次提交

Merge lars op (#35476) · 0c31579c

由 limingshu 提交于 10月 13, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* a test for lars merge

* Adding las_op_momentum infer_shape

* Fix codes

* use avg_numel instead of max_numel to acquire grid num

* modify unittest files about lars op

* Finally converge when merged-lars works

* fix ctest files

* add merged_operation kernel when cuda version is older than 11

* Fix code style

* fix ctest failure

* fix error

* fix all ctest error and change lars compute code of cpu

* fix bugs on v100.

* revert python modififation about lars

* revert python modification codes

0c31579c

27 9月, 2021 1 次提交

Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652) · a112ce42

由 limingshu 提交于 9月 27, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

a112ce42

21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

14 9月, 2021 1 次提交

add layerwise learning rate for adamw (#35569) · 91cf918f

由 zhaoyingli 提交于 9月 14, 2021

* add layerwise learning rate for adamw

* fix format

* add unitest

* add NotImplementedError

* add gpu unitest

* update gpuplace

91cf918f

13 9月, 2021 1 次提交
- T
  
  add xpu_wait & new implementation replace memcpy in adam, adamw (#35437) · 86a6be1a
  由 taixiurong 提交于 9月 13, 2021
  
  86a6be1a
03 9月, 2021 1 次提交
- T
  
  fix bn_infer and optimize momentum for kunlun (#35250) · 8305ba37
  由 TTerror 提交于 9月 03, 2021
  
  8305ba37
27 8月, 2021 1 次提交
- G
  sparse_momentum_op is used to save w@GRAD memory for gather_op (#34942) · 234ce932
  由 Guoxia Wang 提交于 8月 27, 2021
```
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
```
  234ce932
25 8月, 2021 1 次提交
- Z
  
  fix cpu adamw problem for np.float64 (#35124) · 700205e8
  由 zhaoyingli 提交于 8月 25, 2021
  
  700205e8
23 8月, 2021 1 次提交
- Z
  add adamw cuda kernel (#35020) · 77a8a394
  由 zhaoyingli 提交于 8月 23, 2021
```
* adamw support cuda

* adamw support cuda
```
  77a8a394
18 8月, 2021 1 次提交
- L
  [NPU]add rmsprop op (#34864) · 9cbba97b
  由 lzzyzlbb 提交于 8月 18, 2021
```
* [npu]add rmsprop op
```
  9cbba97b
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
11 8月, 2021 1 次提交
- R
  [NPU] add momentum_op_npu and test (#34082) · 9e3e08f0
  由 ronnywang 提交于 8月 11, 2021
```
* add momentum_op_npu and test

* update

* fix hang
```
  9e3e08f0
22 7月, 2021 1 次提交

copy found_inf to cpu in advance to improve performance (#34274) · 781f4028

由 Leo Chen 提交于 7月 22, 2021

* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam

781f4028

14 7月, 2021 1 次提交

adam op adds input SkipUpdate (#34075) · e1e3e3b4

由 Leo Chen 提交于 7月 14, 2021

* adam add input SkipUpdate

* add unittest

* add npu unittest

* fix xpu compile

* remove param stream

e1e3e3b4

21 6月, 2021 1 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

03 6月, 2021 1 次提交
- Y
  
  multi pricison for lars op and lars optimizer (#33280) · 4d805e6a
  由 Yuang Liu 提交于 6月 03, 2021
  
  4d805e6a
26 5月, 2021 1 次提交

[NPU] refine NpuOpRunner (#32869) · 8259d9bf

由 Leo Chen 提交于 5月 26, 2021

* refine ~npuOpRunner

* implement destructor and forbid copy

* use reference to avoid copy

* use const reference

* relax adam precision

* fix top_k

8259d9bf

13 5月, 2021 1 次提交

[NPU] support global accumulator for adam (#32780) · dace3fd5

由 Leo Chen 提交于 5月 13, 2021

* add use_global_beta_pow

* add use_global_beta_pow

* update npu kernel

* update python api

* refine code

* add ut for use_global_beta_pow

* fix npu kernel

* add ut for api

* add ut for exception

* add ut for save/load

dace3fd5

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功