提交 · fb3313e99c957101e30d913b60a95f02904ecf2d · 机器未来 / Paddle

07 1月, 2022 1 次提交

Add multi tensor for adam (#38010) · fb3313e9

由 zhangbo9674 提交于 1月 07, 2022

* add multi tensor for adam

* add merged_adam op

* refine code

* refine adam compute logic

fb3313e9

29 12月, 2021 1 次提交
- S
  
  fix lamb beta1pow beta2pow update (#38518) · 3672480b
  由 sneaxiy 提交于 12月 29, 2021
  
  3672480b
28 12月, 2021 1 次提交
- G
  
  fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
  由 Guoxia Wang 提交于 12月 28, 2021
  
  6f1bb3d6
24 12月, 2021 1 次提交
- Z
  
  [AMP] Add multi_precision for sgd (#38231) · a4d07bb9
  由 zhangbo9674 提交于 12月 24, 2021
  
  a4d07bb9
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
30 11月, 2021 1 次提交

[opt] Add regularation and Nesterov for mergerd_momentum op (#37527) · c8ffdecb

由 zhangbo9674 提交于 11月 30, 2021

* add regularation and Nesterov for mergerd_momentum

* refine unittest for use_nesterov attr

* refine op check

* refine code

* fix bug

* refine code of regularization_flag

* delete useless code

c8ffdecb

29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

17 11月, 2021 1 次提交
- L
  copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
  由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
  5e4b419b
20 10月, 2021 1 次提交
- Z
  
  fix pow2 decay (#36559) · 605e7f08
  由 Zeng Jinle 提交于 10月 20, 2021
  
  605e7f08
19 10月, 2021 1 次提交

Add pow2_decay_with_linear_warmup op (#36421) · 305b99a0

由 Zeng Jinle 提交于 10月 19, 2021

* add pow2_warmup op

* remove contrib __all__

* add AttrT

* rename

* follow comments

* fix duplicate PADDLE_RESTRICT

305b99a0

17 10月, 2021 1 次提交
- Z
  
  refine rescale_grad (#36490) · 4e036fa1
  由 Zeng Jinle 提交于 10月 17, 2021
  
  4e036fa1
15 10月, 2021 2 次提交
- Z
  Remove wrong __restrict__ of CUDA LarsMomentumOpKernel (#36460) · adb80494
  由 Zeng Jinle 提交于 10月 15, 2021
```
* remove wrong restrict

* remove master_param_out __restrict__

* update
```
  adb80494
- Z
  
  fix momentum ops (#36452) · 4dda18a8
  由 Zeng Jinle 提交于 10月 15, 2021
  
  4dda18a8
14 10月, 2021 3 次提交
- Z
  
  fix lars (#36431) · 8256f6fa
  由 Zeng Jinle 提交于 10月 14, 2021
  
  8256f6fa
- Z
  
  refine merge lars (#36428) · 63fd7d66
  由 Zeng Jinle 提交于 10月 14, 2021
  
  63fd7d66
- Z
  Merge momentum ops/kernels (#36380) · f4eda869
  由 Zeng Jinle 提交于 10月 14, 2021
```
* merge momentum ops

* update

* add ut to improve coverage

* remove optimizer change

* fix error msg

* update ut

* add __restrict__ for CUDA

* update ut

* move merged_momentum_op to optimizer dir

* fix coverage
```
  f4eda869
13 10月, 2021 1 次提交

Merge lars op (#35476) · 0c31579c

由 limingshu 提交于 10月 13, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* a test for lars merge

* Adding las_op_momentum infer_shape

* Fix codes

* use avg_numel instead of max_numel to acquire grid num

* modify unittest files about lars op

* Finally converge when merged-lars works

* fix ctest files

* add merged_operation kernel when cuda version is older than 11

* Fix code style

* fix ctest failure

* fix error

* fix all ctest error and change lars compute code of cpu

* fix bugs on v100.

* revert python modififation about lars

* revert python modification codes

0c31579c

27 9月, 2021 1 次提交

Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652) · a112ce42

由 limingshu 提交于 9月 27, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

a112ce42

21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

14 9月, 2021 1 次提交

add layerwise learning rate for adamw (#35569) · 91cf918f

由 zhaoyingli 提交于 9月 14, 2021

* add layerwise learning rate for adamw

* fix format

* add unitest

* add NotImplementedError

* add gpu unitest

* update gpuplace

91cf918f

13 9月, 2021 1 次提交
- T
  
  add xpu_wait & new implementation replace memcpy in adam, adamw (#35437) · 86a6be1a
  由 taixiurong 提交于 9月 13, 2021
  
  86a6be1a
03 9月, 2021 1 次提交
- T
  
  fix bn_infer and optimize momentum for kunlun (#35250) · 8305ba37
  由 TTerror 提交于 9月 03, 2021
  
  8305ba37
27 8月, 2021 1 次提交
- G
  sparse_momentum_op is used to save w@GRAD memory for gather_op (#34942) · 234ce932
  由 Guoxia Wang 提交于 8月 27, 2021
```
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
```
  234ce932
25 8月, 2021 1 次提交
- Z
  
  fix cpu adamw problem for np.float64 (#35124) · 700205e8
  由 zhaoyingli 提交于 8月 25, 2021
  
  700205e8
23 8月, 2021 1 次提交
- Z
  add adamw cuda kernel (#35020) · 77a8a394
  由 zhaoyingli 提交于 8月 23, 2021
```
* adamw support cuda

* adamw support cuda
```
  77a8a394
18 8月, 2021 1 次提交
- L
  [NPU]add rmsprop op (#34864) · 9cbba97b
  由 lzzyzlbb 提交于 8月 18, 2021
```
* [npu]add rmsprop op
```
  9cbba97b
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
11 8月, 2021 1 次提交
- R
  [NPU] add momentum_op_npu and test (#34082) · 9e3e08f0
  由 ronnywang 提交于 8月 11, 2021
```
* add momentum_op_npu and test

* update

* fix hang
```
  9e3e08f0
22 7月, 2021 1 次提交

copy found_inf to cpu in advance to improve performance (#34274) · 781f4028

由 Leo Chen 提交于 7月 22, 2021

* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam

781f4028

14 7月, 2021 1 次提交

adam op adds input SkipUpdate (#34075) · e1e3e3b4

由 Leo Chen 提交于 7月 14, 2021

* adam add input SkipUpdate

* add unittest

* add npu unittest

* fix xpu compile

* remove param stream

e1e3e3b4

21 6月, 2021 1 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

03 6月, 2021 1 次提交
- Y
  
  multi pricison for lars op and lars optimizer (#33280) · 4d805e6a
  由 Yuang Liu 提交于 6月 03, 2021
  
  4d805e6a
26 5月, 2021 1 次提交

[NPU] refine NpuOpRunner (#32869) · 8259d9bf

由 Leo Chen 提交于 5月 26, 2021

* refine ~npuOpRunner

* implement destructor and forbid copy

* use reference to avoid copy

* use const reference

* relax adam precision

* fix top_k

8259d9bf

13 5月, 2021 1 次提交

[NPU] support global accumulator for adam (#32780) · dace3fd5

由 Leo Chen 提交于 5月 13, 2021

* add use_global_beta_pow

* add use_global_beta_pow

* update npu kernel

* update python api

* refine code

* add ut for use_global_beta_pow

* fix npu kernel

* add ut for api

* add ut for exception

* add ut for save/load

dace3fd5

10 5月, 2021 1 次提交
- L
  
  fix npu compile error (#32820) · 5aa8faa2
  由 Leo Chen 提交于 5月 10, 2021
  
  5aa8faa2
28 4月, 2021 1 次提交

[NPU] add input EpsilonTensor for adam (#32605) · 119cda3d

由 Leo Chen 提交于 4月 28, 2021

* add input EpsilonTensor for adam

* update python api

* add unit test

* add npu test

* add more ut

119cda3d

23 4月, 2021 1 次提交
- L
  
  [NPU] Fix bug that epsilon become 0 using power (#32469) · 49773f36
  由 Leo Chen 提交于 4月 23, 2021
  
  49773f36
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致