提交 · df5152551d933487c7e9f0edd47c7066f2c95f86 · PaddlePaddle / Paddle

21 9月, 2021 1 次提交

Reuse OneDNN handler for SGD and SUM for SelectedRows input tensors. (#35510) · 799f3861

由 Adam Osewski 提交于 9月 20, 2021

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT

799f3861

21 6月, 2021 1 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

14 4月, 2021 1 次提交

adds new CPU kernel for SGD op supporting BF16 data type (#32162) · 3ac6c189

由 Adam Osewski 提交于 4月 14, 2021

* Initial draft for SGD BG16 kernel.

* Unit tests for SGD with BF16 data type.

* Add VLOG message to SGD BF16 op CPU kernel.

* Enhance error messages and error types.

* Refactor SGD op kernels to leverage some common code.

* Make easier to add new kerne invoke code.

* Fix SGD op kernel for sparse grad.

* Unify quotes style.

* Fix error for ROCM compilation.

* Use specialized PADDLE_ENFORCE_xx functions.

3ac6c189

27 9月, 2020 1 次提交
- C
  fix error message (#27318) · d014e29f
  由 Chengmo 提交于 9月 27, 2020
```
* fix sgd/momentum/dpsgd/rmsprop error message
```
  d014e29f
24 10月, 2019 1 次提交
- W
  
  Fix DGC algorithm flow to make it the same as paper (#20758) · 250e72d2
  由 WangXi 提交于 10月 24, 2019
  
  250e72d2
08 3月, 2019 1 次提交
- T
  simplify the jitkernel templates and tests · 14a764c9
  由 tensor-tang 提交于 3月 08, 2019
```
test=develop
```
  14a764c9
07 3月, 2019 1 次提交
- T
  unify the kernelfuncs cache and add unit test · 802f362a
  由 tensor-tang 提交于 3月 07, 2019
```
test=develop
```
  802f362a
04 3月, 2019 1 次提交
- T
  enable sgd jitkernel refer code and test · 92f3cf42
  由 tensor-tang 提交于 2月 22, 2019
```
test=develop
```
  92f3cf42
23 2月, 2019 1 次提交
- T
  enable sgd jitkernel refer code and test · a0c37662
  由 tensor-tang 提交于 2月 22, 2019
```
test=develop
```
  a0c37662
27 12月, 2018 2 次提交
- M
  Polish code · 5822f7f1
  由 minqiyang 提交于 12月 27, 2018
```
test=develop
```
  5822f7f1
- M
  
  Add support for optimizer · 68e9b841
  由 minqiyang 提交于 12月 27, 2018
  
  68e9b841
26 11月, 2018 1 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
16 11月, 2018 1 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

13 11月, 2018 1 次提交
- Q
  sgd_op optimize selected rows do not enforce id < height · efb5c03f
  由 Qiao Longfei 提交于 11月 13, 2018
```
test=develop
```
  efb5c03f
08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
17 8月, 2018 1 次提交
- Q
  Optimize selected rows for dist lookup table with pthread rwlock (#12635) · 653fad08
  由 Qiao Longfei 提交于 8月 17, 2018
```
Optimize selected rows for dist lookup table with rwlock 
```
  653fad08
05 6月, 2018 1 次提交
- S
  
  Fix signed-unsigned comparison warning (#11167) · 71b6bdb5
  由 Siddharth Goyal 提交于 6月 04, 2018
  
  71b6bdb5
29 5月, 2018 1 次提交
- Q
  
  fix sgd for SelectedRows bug · 5825196d
  由 qiaolongfei 提交于 5月 29, 2018
  
  5825196d
17 4月, 2018 1 次提交
- Y
  
  update · ca327508
  由 Yancey1989 提交于 4月 17, 2018
  
  ca327508
13 4月, 2018 1 次提交
- A
  
  Fix warnings in sgd_op.h · 3794027d
  由 Abhinav Arora 提交于 4月 12, 2018
  
  3794027d
03 4月, 2018 2 次提交
- Q
  
  optimize code · 31e8d807
  由 qiaolongfei 提交于 4月 03, 2018
  
  31e8d807
- Q
  
  sgd_op support optimize SelectedRows · 2669aea6
  由 qiaolongfei 提交于 4月 03, 2018
  
  2669aea6
09 3月, 2018 1 次提交
- Y
  Fix sparse update memory error for distributed training (#8837) · 84680379
  由 Yancey 提交于 3月 09, 2018
```
Fix sparse update memory error for distributed training
```
  84680379
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
23 12月, 2017 1 次提交
- C
  
  refine sgd-op · 02fda711
  由 chengduoZH 提交于 12月 23, 2017
  
  02fda711
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

18 10月, 2017 3 次提交
- Q
  
  fix gpu build error · f9681459
  由 qijun 提交于 10月 17, 2017
  
  f9681459
- Q
  
  add sparse sgd operator unittest · ab8cc401
  由 qijun 提交于 10月 17, 2017
  
  ab8cc401
- Q
  
  add sparse kernel of sgd operator · 182ce51c
  由 qijun 提交于 10月 17, 2017
  
  182ce51c
05 10月, 2017 4 次提交
- Q
  
  optimize the dsize · 8ebc31d9
  由 qiaolongfei 提交于 10月 04, 2017
  
  8ebc31d9
- Q
  
  remove using in sgd header file · 775c6024
  由 qiaolongfei 提交于 10月 04, 2017
  
  775c6024
- Q
  
  use EigenScalar to get learning_rate from GPU device · ee7b3ed0
  由 qiaolongfei 提交于 10月 04, 2017
  
  ee7b3ed0
- A
  
  Changing SGD inputs and outputs to conform to Operator naming convention (#4586) · eed2c1e1
  由 Abhinav Arora 提交于 10月 04, 2017
  
  eed2c1e1
04 10月, 2017 1 次提交
- A
  
  Changing learning rate from type Input(float) to Input(tensor) (#4578) · 324876bb
  由 Abhinav Arora 提交于 10月 03, 2017
  
  324876bb
03 10月, 2017 1 次提交
- A
  Changing learning rate from attribute to input(float) (#4568) · 42e7fe05
  由 Abhinav Arora 提交于 10月 02, 2017
```
* Changing learning rate from attribute to input(float)
* Removing obsolete code
```
  42e7fe05
28 9月, 2017 1 次提交
- Y
  
  Add Skeleton of Double support · 3a5693e0
  由 Yu Yang 提交于 9月 27, 2017
  
  3a5693e0
06 9月, 2017 1 次提交
- Y
  Change `Op::GetAttr` to `Op::Attr` · 9de6a4b3
  由 Yu Yang 提交于 9月 05, 2017
```
Fix #3902
```
  9de6a4b3
04 9月, 2017 1 次提交
- Q
  
  add GetAttr to InferShapeContext · d323831a
  由 qiaolongfei 提交于 9月 03, 2017
  
  d323831a

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功