提交 · 9ad800ebb2a8b32c28e5440d2145ff053219389d · Crayon鑫 / Paddle

04 12月, 2020 1 次提交

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

03 12月, 2020 7 次提交
- T
  fix gpu outofrange (#29238) · 83587916
  由 tangwei12 提交于 12月 03, 2020
```
* fix gpu emb out of range

Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf

* fix doc

Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf
```
  83587916
- L
  use has_grad instead of train_mode (#29309) · b58cfff8
  由 Leo Chen 提交于 12月 03, 2020
```
* use has_grad instead of train_mode

* add vlog for debug

* fix ut

* fix ut
```
  b58cfff8
- Z
  improve elementwise_add_grad perf (#29277) · befd6d53
  由 Zhang Ting 提交于 12月 03, 2020
```
* improve performance of elementwise_sum_grad
```
  befd6d53
- S
  fix tensorrt output shape error (#29308) · ebf68919
  由 Shang Zhizhou 提交于 12月 03, 2020
```
* fix tensorrt output shape error

* fix unittest tensorrt_engine_op_test

* fix code style for unitest
```
  ebf68919
- A
  
  [Dy2Stat] Add cache for Executor and Context in run_program_op (#28421) · 67c700b4
  由 Aurelius84 提交于 12月 03, 2020
  
  67c700b4
- S
  
  fix the warning of reducer (#29323) · 696dc4bb
  由 ShenLiang 提交于 12月 03, 2020
  
  696dc4bb
- W
  
  polish the code of cumsum and remove some unused code (#29303) · c4be80f4
  由 wangchaochaohu 提交于 12月 03, 2020
  
  c4be80f4
02 12月, 2020 8 次提交

W

fix analysis_config bug. (#29304) · d68af02c
由 Wilber 提交于 12月 02, 2020

d68af02c
S

enforce the matmul_v2 error message (#29297) · 0fb18bc2
由 ShenLiang 提交于 12月 02, 2020

0fb18bc2
Z

Remove some useless log. (#29300) · 9b59a589
由 Zhen Wang 提交于 12月 02, 2020

9b59a589
L

fix shape of tile_grad op (#29289) · 13a22a37
由 Leo Chen 提交于 12月 02, 2020

13a22a37

Add pure fp16 training with master weights. (#27712) · be3777a5

由 Zhen Wang 提交于 12月 02, 2020

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

* For CI Coverage Checking.

be3777a5

W

change import math.h to cmath (#29260) · 6673fb05
由 Wojciech Uss 提交于 12月 02, 2020

6673fb05

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

S

fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF (#29275) · c59b4f28
由 Shang Zhizhou 提交于 12月 02, 2020

c59b4f28

01 12月, 2020 9 次提交

Improve performance of elementwise_add grad op (#29187) · 116305ea

由 Leo Chen 提交于 12月 01, 2020

* pass stop_gradient for cast op

* improve performance of elementwise_add grad

* use tensor copy async

* dygraph branch

* fix dygraph branch

* add ut

116305ea

卖
add deformable_conv op on xpu (#29234) · 07c67d5a
由卖鱼的哲学提交于 12月 01, 2020
```
* rebase develop

* update deformable_conv op on xpu

* update deformable_conv op on xpu
```
07c67d5a
C
Hot fix complle failed in gcc4.8 caused by complex impl (#29254) · 1de32f82
由 Chen Weihang 提交于 12月 01, 2020
```
* hot fix complle failed in gcc4.8

* fix failed unittest
```
1de32f82
G
Fix a bug when running on an operating system without "bash." (#29131) · 642abe2a
由 GeminiCarrie 提交于 12月 01, 2020
```
* Fix a bug when running on an operating system without "bash."

* add execution condition

* for ci-coverage
```
642abe2a
S

Change the api of DataParallel and Fleet (#29224) · 46b73e6c
由 ShenLiang 提交于 12月 01, 2020

46b73e6c

update kunlun conv2d/softmax/elementwise implemetation (#29229) · 64f29fbb

由 QingshuChen 提交于 12月 01, 2020

* update conv2d & softmax to new xpu api
* test=kunlun

* remove useless comments
* test=kunlun

* remote softmax xpu op
* test=kunlun

* update kunlun softmax
* test=kunlun

* update xpu unitest
* test=kunlun

* fix elementwise_grad bug for kunlun
*test=kunlun

64f29fbb

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429) · c0a991c8

由 Zhou Wei 提交于 12月 01, 2020

* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor

* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor

* fix coverage

* fix api doc

* fix CI unittest

* fix CI unittest

* fix unitest

* empty tensor does’t need inner_var_

* fix some error message

c0a991c8

W

fix lite unit test. (#29233) · 74c43ac6
由 Wilber 提交于 12月 01, 2020

74c43ac6

30 11月, 2020 10 次提交

A
Small optimizations for conv2d kernel subroutines. (#29188) · 4096ff94
由 Adam Osewski 提交于 11月 30, 2020
```
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
```
4096ff94
J

Enable all image classification models (#29155) · 5c61eeef
由 joanna.wozna.intel 提交于 11月 30, 2020

5c61eeef
W

[Lite-Subgraph] Fix compile error for lite subgraph. (#29146) · 4fec182d
由 Wilber 提交于 11月 30, 2020

4fec182d

Update ps gpu (#29209) · b5c63423

由 123malin 提交于 11月 30, 2020

* fix paramete prefetch & device guard
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

b5c63423

Check whether there is any inplace operation affecting gradient calculation. (#27901) · 865a4598

由 liym27 提交于 11月 30, 2020

* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.

* Add a new attribute `_inplace_version` for VarBase.

* Raise exception if an inplace operation can result in incorrect gradient computation.

* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.

* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.

* Use original var_wrapper if the inplace_version is not changed.

* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.

865a4598

1
prefetch optimize (#29095) · 03d4665f
由 123malin 提交于 11月 30, 2020
```
* test=develop, optimize async prefetch
```
03d4665f
W

optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
由 WangXi 提交于 11月 30, 2020

0c2a51d2

Polish unittests details and execution conditions to adapt to MUSL (#29044) · 0b032fae

由 Chen Weihang 提交于 11月 30, 2020

* fix failed tests in yingchun gived list

* add unittests into static_mode_white_list

* add enable static

* fix dist unittest

* skip test_sigmoid_focal_loss_op & add gym

* revert no need skip unittests

* remove gym

0b032fae

W

Add quantization of multi_gru op and tests (#28615) · 4fd4095d
由 Wojciech Uss 提交于 11月 30, 2020

4fd4095d
J
fix gru gcc7.4 bug for the gru compile · bc6033f8
由 Jack Zhou 提交于 11月 30, 2020
```
fix gru gcc7.4 bug for the gru compile
```
bc6033f8

28 11月, 2020 1 次提交
- W
  
  optimize cumsum OP (#29193) · b818429a
  由 wangchaochaohu 提交于 11月 28, 2020
  
  b818429a
27 11月, 2020 4 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

L
update expand as op to use the shape of the target tensor instead of the... · 7e5e9934
由 lilong12 提交于 11月 27, 2020
```
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)

* update, test=develop
```
7e5e9934
Z

fix CUDA 11 error on windows (#29101) · e668cb07
由 Zhou Wei 提交于 11月 27, 2020

e668cb07
J
Add eigen gru and fix the dropout bug in the rnn · 085260f3
由 Jack Zhou 提交于 11月 27, 2020
```
Add eigen gru and fix the dropout bug in the rnn 
```
085260f3

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致