提交 · 14cf420ec2801531f506216e2e8353b05a97499f · BaiXuePrincess / Paddle

08 12月, 2020 1 次提交
- Z
  
  revert cast eigen kernel (#29445) · 14cf420e
  由 Zhang Ting 提交于 12月 08, 2020
  
  14cf420e
07 12月, 2020 2 次提交
- W
  
  polish the code of cumsum and remove some unused code (#29303) (#29423) · d77566b3
  由 wangchaochaohu 提交于 12月 07, 2020
  
  d77566b3
- T
  fix gpu outofrange (#29238) (#29348) · de3c067a
  由 tangwei12 提交于 12月 07, 2020
```
* fix gpu emb out of range

Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf

* fix doc

Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf
```
  de3c067a
05 12月, 2020 1 次提交

由 chentianyu03 提交于 12月 05, 2020

* fix random failed of complex matmul

* Make transpose, trace, kron, reshape, sum op support complex type (#29321)

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

* kron, reshape, transpose support complex types

* sum and trace op support complex types

* add test case of sum and trace op

* fix the bug of imag part of complex not initialized

* format file

* format code style

* kron support type promotion; modify test cases

fbb6cd70

04 12月, 2020 2 次提交

S
fix tensorrt output shape error (#29308) (#29344) · 7a0602c8
由 Shang Zhizhou 提交于 12月 04, 2020
```
* fix tensorrt output shape error

* fix unittest tensorrt_engine_op_test

* fix code style for unitest
```
7a0602c8

Support type promote for basic math ops (quantum required) (#29265) (#29354) · 0e7539e7

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

0e7539e7

03 12月, 2020 2 次提交

L

fix shape of tile_grad op (#29289) (#29324) · 8cd8cd53
由 Leo Chen 提交于 12月 03, 2020

8cd8cd53

[Cherry-pick] Add pure fp16 training with master weights. (#29301) · d8ea8a06

由 Zhen Wang 提交于 12月 03, 2020

* Add pure fp16 training with master weights. (#27712)

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

d8ea8a06

01 12月, 2020 2 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

W

fix lite unit test. (#29233) · 74c43ac6
由 Wilber 提交于 12月 01, 2020

74c43ac6

30 11月, 2020 5 次提交
- A
  Small optimizations for conv2d kernel subroutines. (#29188) · 4096ff94
  由 Adam Osewski 提交于 11月 30, 2020
```
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
```
  4096ff94
- 1
  Update ps gpu (#29209) · b5c63423
  由 123malin 提交于 11月 30, 2020
```
* fix paramete prefetch & device guard
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>
```
  b5c63423
- 1
  prefetch optimize (#29095) · 03d4665f
  由 123malin 提交于 11月 30, 2020
```
* test=develop, optimize async prefetch
```
  03d4665f
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
- J
  fix gru gcc7.4 bug for the gru compile · bc6033f8
  由 Jack Zhou 提交于 11月 30, 2020
```
fix gru gcc7.4 bug for the gru compile
```
  bc6033f8
28 11月, 2020 1 次提交
- W
  
  optimize cumsum OP (#29193) · b818429a
  由 wangchaochaohu 提交于 11月 28, 2020
  
  b818429a
27 11月, 2020 4 次提交

L
update expand as op to use the shape of the target tensor instead of the... · 7e5e9934
由 lilong12 提交于 11月 27, 2020
```
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)

* update, test=develop
```
7e5e9934
J
Add eigen gru and fix the dropout bug in the rnn · 085260f3
由 Jack Zhou 提交于 11月 27, 2020
```
Add eigen gru and fix the dropout bug in the rnn 
```
085260f3
A

Fixes mkldnn dygraph learning rate scheduler crashes (#28988) · bc902044
由 arlesniak 提交于 11月 27, 2020

bc902044

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

26 11月, 2020 2 次提交
- N
  Fix ops doc for some ops · da71173b
  由 Noel 提交于 11月 26, 2020
```
Fix ops doc for some ops 
```
  da71173b
- J
  Add bf16 pool2d and unify bf16 unit tests (#29039) · b0d1ac16
  由 joanna.wozna.intel 提交于 11月 26, 2020
```
* Add bf16 pool2d and unify bf16 unit tests

* Add change default ops test
```
  b0d1ac16
25 11月, 2020 4 次提交
- J
  add uint8 for reshape op (#28996) · 582c0a04
  由 joejiong 提交于 11月 25, 2020
```
add uint8 for reshape operator
```
  582c0a04
- T
  
  add xpu elementwise ops (#29031) · a5aa4dc7
  由 taixiurong 提交于 11月 25, 2020
  
  a5aa4dc7
- J
  Update pow (#29000) · b04c78ef
  由 joejiong 提交于 11月 25, 2020
```
Simple code clean up
```
  b04c78ef
- W
  remove eigen threadpool for the speed up · b2c8a007
  由 wawltor 提交于 11月 25, 2020
```
remove eigen threadpool for the speed up
```
  b2c8a007
24 11月, 2020 2 次提交
- L
  
  update, test=develop (#28700) · 767d0ba2
  由 lilong12 提交于 11月 24, 2020
  
  767d0ba2
- 1
  【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442) · fbf9564f
  由 123malin 提交于 11月 24, 2020
```
* test=develop, optimize global_step
```
  fbf9564f
23 11月, 2020 3 次提交
- F
  refactor momentum op to combine weight (#27414) · 8ff35506
  由 furnace 提交于 11月 23, 2020
```
* refactor momentum op to combine weight_decay (scale op and sum op)
```
  8ff35506
- J
  
  extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758) · bd1d6d3b
  由 Jacek Czaja 提交于 11月 23, 2020
  
  bd1d6d3b
- Y
  
  fix truncated_gaussian seed (#28777) · 71c1cd14
  由 yaoxuefeng 提交于 11月 23, 2020
  
  71c1cd14
20 11月, 2020 9 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
- C
  
  fix occupied 0 device memory bug (#28771) · b969c32a
  由 Chen Weihang 提交于 11月 20, 2020
  
  b969c32a
- J
  add uint8 support for squeeze operator (#28734) · 1a532d51
  由 joejiong 提交于 11月 20, 2020
```
Adding uint8 support for squeeze operator.
```
  1a532d51
- W
  
  fix the number of perf algo for conv cudnn in exhaustive mode (#28694) · 8b853b30
  由 wangchaochaohu 提交于 11月 20, 2020
  
  8b853b30
- J
  Add bf16 matmul, fc, elementwise add and mul (#28729) · 8c0ea4bf
  由 joanna.wozna.intel 提交于 11月 20, 2020
```
* Add bf16 matmul, fc, elementwise add and mul

* Correct unit test
```
  8c0ea4bf
- Y
  
  fix shuffle batch op shuffle (#28533) · 08b62f49
  由 yaoxuefeng 提交于 11月 20, 2020
  
  08b62f49
- T
  add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542) · d3d1a6b6
  由 taixiurong 提交于 11月 20, 2020
```
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api

* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
```
  d3d1a6b6
- J
  Add LSTM, Simple RNN and GRU CPU kernel (#28577) · 9362d85e
  由 Jack Zhou 提交于 11月 20, 2020
```
* add lstm, simple rnn op kernel

* fix the test_lstm for the rnn op

* change func name

* fix forward postprocess bug

* add gru forward, backward code

* remove unittest.skipIf; use a big rnn op instead of combination op

* fix input doesn't have gradient bug

* add eigen lstm forward, backward
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
```
  9362d85e
- Q
  adjust kunlun header file (#28536) · 30ef3815
  由 QingshuChen 提交于 11月 20, 2020
```
* adjust kunlun header file
*test=kunlun

* update kunlun unittest
*test=kunlun

* update xpu unitest
* test = kunlun

* update xpu unittest
* test=kunlun

* update xpu unitest
* test=kunlun
```
  30ef3815

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致