提交 · be3777a50a08fa06f0a700f1fd5bead38ac47e1b · BaiXuePrincess / Paddle

02 12月, 2020 2 次提交

Add pure fp16 training with master weights. (#27712) · be3777a5

由 Zhen Wang 提交于 12月 02, 2020

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

* For CI Coverage Checking.

be3777a5

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

01 12月, 2020 5 次提交

Improve performance of elementwise_add grad op (#29187) · 116305ea

由 Leo Chen 提交于 12月 01, 2020

* pass stop_gradient for cast op

* improve performance of elementwise_add grad

* use tensor copy async

* dygraph branch

* fix dygraph branch

* add ut

116305ea

卖
add deformable_conv op on xpu (#29234) · 07c67d5a
由卖鱼的哲学提交于 12月 01, 2020
```
* rebase develop

* update deformable_conv op on xpu

* update deformable_conv op on xpu
```
07c67d5a

update kunlun conv2d/softmax/elementwise implemetation (#29229) · 64f29fbb

由 QingshuChen 提交于 12月 01, 2020

* update conv2d & softmax to new xpu api
* test=kunlun

* remove useless comments
* test=kunlun

* remote softmax xpu op
* test=kunlun

* update kunlun softmax
* test=kunlun

* update xpu unitest
* test=kunlun

* fix elementwise_grad bug for kunlun
*test=kunlun

64f29fbb

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

W

fix lite unit test. (#29233) · 74c43ac6
由 Wilber 提交于 12月 01, 2020

74c43ac6

30 11月, 2020 5 次提交
- A
  Small optimizations for conv2d kernel subroutines. (#29188) · 4096ff94
  由 Adam Osewski 提交于 11月 30, 2020
```
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
```
  4096ff94
- 1
  Update ps gpu (#29209) · b5c63423
  由 123malin 提交于 11月 30, 2020
```
* fix paramete prefetch & device guard
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>
```
  b5c63423
- 1
  prefetch optimize (#29095) · 03d4665f
  由 123malin 提交于 11月 30, 2020
```
* test=develop, optimize async prefetch
```
  03d4665f
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
- J
  fix gru gcc7.4 bug for the gru compile · bc6033f8
  由 Jack Zhou 提交于 11月 30, 2020
```
fix gru gcc7.4 bug for the gru compile
```
  bc6033f8
28 11月, 2020 1 次提交
- W
  
  optimize cumsum OP (#29193) · b818429a
  由 wangchaochaohu 提交于 11月 28, 2020
  
  b818429a
27 11月, 2020 4 次提交

L
update expand as op to use the shape of the target tensor instead of the... · 7e5e9934
由 lilong12 提交于 11月 27, 2020
```
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)

* update, test=develop
```
7e5e9934
J
Add eigen gru and fix the dropout bug in the rnn · 085260f3
由 Jack Zhou 提交于 11月 27, 2020
```
Add eigen gru and fix the dropout bug in the rnn 
```
085260f3
A

Fixes mkldnn dygraph learning rate scheduler crashes (#28988) · bc902044
由 arlesniak 提交于 11月 27, 2020

bc902044

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

26 11月, 2020 2 次提交
- N
  Fix ops doc for some ops · da71173b
  由 Noel 提交于 11月 26, 2020
```
Fix ops doc for some ops 
```
  da71173b
- J
  Add bf16 pool2d and unify bf16 unit tests (#29039) · b0d1ac16
  由 joanna.wozna.intel 提交于 11月 26, 2020
```
* Add bf16 pool2d and unify bf16 unit tests

* Add change default ops test
```
  b0d1ac16
25 11月, 2020 4 次提交
- J
  add uint8 for reshape op (#28996) · 582c0a04
  由 joejiong 提交于 11月 25, 2020
```
add uint8 for reshape operator
```
  582c0a04
- T
  
  add xpu elementwise ops (#29031) · a5aa4dc7
  由 taixiurong 提交于 11月 25, 2020
  
  a5aa4dc7
- J
  Update pow (#29000) · b04c78ef
  由 joejiong 提交于 11月 25, 2020
```
Simple code clean up
```
  b04c78ef
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
24 11月, 2020 2 次提交
- L
  
  update, test=develop (#28700) · 767d0ba2
  由 lilong12 提交于 11月 24, 2020
  
  767d0ba2
- 1
  【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442) · fbf9564f
  由 123malin 提交于 11月 24, 2020
```
* test=develop, optimize global_step
```
  fbf9564f
23 11月, 2020 3 次提交
- F
  refactor momentum op to combine weight (#27414) · 8ff35506
  由 furnace 提交于 11月 23, 2020
```
* refactor momentum op to combine weight_decay (scale op and sum op)
```
  8ff35506
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
- Y
  
  fix truncated_gaussian seed (#28777) · 71c1cd14
  由 yaoxuefeng 提交于 11月 23, 2020
  
  71c1cd14
20 11月, 2020 10 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
- C
  
  fix occupied 0 device memory bug (#28771) · b969c32a
  由 Chen Weihang 提交于 11月 20, 2020
  
  b969c32a
- J
  add uint8 support for squeeze operator (#28734) · 1a532d51
  由 joejiong 提交于 11月 20, 2020
```
Adding uint8 support for squeeze operator.
```
  1a532d51
- W
  
  fix the number of perf algo for conv cudnn in exhaustive mode (#28694) · 8b853b30
  由 wangchaochaohu 提交于 11月 20, 2020
  
  8b853b30
- J
  Add bf16 matmul, fc, elementwise add and mul (#28729) · 8c0ea4bf
  由 joanna.wozna.intel 提交于 11月 20, 2020
```
* Add bf16 matmul, fc, elementwise add and mul

* Correct unit test
```
  8c0ea4bf
- Y
  
  fix shuffle batch op shuffle (#28533) · 08b62f49
  由 yaoxuefeng 提交于 11月 20, 2020
  
  08b62f49
- T
  add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542) · d3d1a6b6
  由 taixiurong 提交于 11月 20, 2020
```
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api

* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
```
  d3d1a6b6
- J
  Add LSTM, Simple RNN and GRU CPU kernel (#28577) · 9362d85e
  由 Jack Zhou 提交于 11月 20, 2020
```
* add lstm, simple rnn op kernel

* fix the test_lstm for the rnn op

* change func name

* fix forward postprocess bug

* add gru forward, backward code

* remove unittest.skipIf; use a big rnn op instead of combination op

* fix input doesn't have gradient bug

* add eigen lstm forward, backward
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
```
  9362d85e
- Q
  adjust kunlun header file (#28536) · 30ef3815
  由 QingshuChen 提交于 11月 20, 2020
```
* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun
```
  30ef3815
- Z
  
  improve performance of cast op (#28727) · dab49205
  由 Zhang Ting 提交于 11月 20, 2020
  
  dab49205
19 11月, 2020 2 次提交
- Y
  
  fix truncated_gaussian op cuda seed setting (#28678) · 03f46e35
  由 yaoxuefeng 提交于 11月 19, 2020
  
  03f46e35
- W
  Add multi_gru op and tests (#28591) · 04bcc13f
  由 Wojciech Uss 提交于 11月 19, 2020
```
* Add multi_gru op and tests

* removed redundant disable_dygraph()
```
  04bcc13f

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致