提交 · e59524f86d472f0f36e09cc41c3ca882d3fc2841 · Crayon鑫 / Paddle

11 1月, 2021 11 次提交

[cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8

由 wangchaochaohu 提交于 1月 11, 2021

* elementwise_add_grad Op optimization  (#29575)

* optimize for long width for elementwise (#29602)

* refine (#29622)

* delete the code for fp16 optimization because it is not faster than common template code (#29715)

* fix the shape choose of vectorize for cuda

* optimization for fp16 elementwise add (#29744)

* Fix the compiler error for half type (#29799)

* refine the compiler error for half2 operation (#29816)

* fix the compiler error when gcc4 cuda9.0 (#29997)

e59524f8

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[Cherry pick] improve dropout (#30260) · b4931ab1

由 Zhang Ting 提交于 1月 11, 2021

* improve dropout (#29465)

* improve drop out

* add VectorizedRandomGeneratorWithGenerator

* fix bug

* modify according to comments

* improve dropout grad (#29605)

* improve grad perf

* fix the bug of dropout_grad (#29813)

b4931ab1

[cherry-pick] softmax optimize (#30279) · b80beb16

由 GaoWei8 提交于 1月 11, 2021

* Softmax vectorization (#29404)

* vec softmax fw

* vec softmax bw

* add a message argument for compiler compatibility

* optimize softmax forward (#30217)

* optimize softmax forward
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

b80beb16

W

Cherry-pick 30194 30164 30201(#30202) · 36de178a
由 Wilber 提交于 1月 11, 2021

36de178a
A
Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
由 Aurelius84 提交于 1月 11, 2021
```
* fix tensor shape bug

* fix op_num

* clean code
```
55604248
G
Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
由 guofei 提交于 1月 11, 2021
```
* Quantization supports 2.0 APIs

* Fix the error of save_quantized_model
```
393a91f1

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

C
[Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
由 Chen Weihang 提交于 1月 10, 2021
```
att, cherry-pick of #30219
```
1fa98c5d
X
[cherry-pick] clean redundant API alias in 2.0 - part 2 (#30244) · 70cbde83
由 XiaoguangHu 提交于 1月 10, 2021
```
* fix dynamic to static error

* delete paddle.nn.functional.assign
```
70cbde83

add aarch64 and sunway kunlun lib (#30027) (#30237) · eacbd488

由 QingshuChen 提交于 1月 11, 2021

* add aarch64 and sunway kunlun lib

* minor

* optimize elementwise_add for kunlun

* update kunlun dependence

* minor

* minor

eacbd488

10 1月, 2021 1 次提交
- W
  
  fix adamw apply gradient (#30130) (#30207) · c4cd99f3
  由 WangXi 提交于 1月 10, 2021
  
  c4cd99f3
09 1月, 2021 1 次提交
- L
  
  fix pad (#30231) · 6d1fb79d
  由 littletomatodonkey 提交于 1月 09, 2021
  
  6d1fb79d
08 1月, 2021 11 次提交

[Cherry-Pick 2.0] In creation.assgin, reuse implamention code of... · 8e788e27

由 liym27 提交于 1月 08, 2021

[Cherry-Pick 2.0] In creation.assgin, reuse implamention code of layers.tensor.assign to avoid maintain two code (#30227) (#30236)

cherry-pick #30227

8e788e27

[cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not... · 2ba9bdd7

由 liym27 提交于 1月 08, 2021

[cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive #29965 (#30235)

* [Cherry-Pick 2.0] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive (#29965)

1. When x is Variable, call nn.shape(x) only in following cases:
 1）The shape of x is used in control flow condition.
 2）The dim to be used is negetive
2. When x is Variable, but x.shape or x.shape[idx] doesn't contain negetive value, don't convert to paddle.shape()

* [Cherry-Pick 2.0] [Dy2Stat] Use Paddle2.0 api paddle.tensor.array_* (#30156)

2ba9bdd7

[Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4

由 huangxu96 提交于 1月 08, 2021

* Optimizer trans momentum (#29597)

* merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.

* Add unittest for 2.0  Momentum API.

* fix some bugs in weight_decay.

* add alias for fluid.contrib.mixed_precision (#29562)

* add alias for fluid.contrib.mixed_precision

* add static.amp into setup.pu.in (#29621)

* add static.amp into setup.pu.in

* add unittest for api

* fix a bug in multi_precision_fp16 unittest. (#29756)

9f7c66b4

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39

由 liym27 提交于 1月 08, 2021

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)

1. when slice_item is a slice:
 1) the start of __getitem__ should be std::max(start, 0) if slice
 2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data

5fe3da39

[Cherry-Pick 2.0][setitem] Support Tensor setitem in static mode (#29708) (#30104) · f46ddc0e

由 liym27 提交于 1月 08, 2021

1. Type of index: int, slice(step must be 1).

2. Type of value:
 (1) int32, int64, float32, bool;
 (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
 (3) paddle.Tensor(int32, int64, float32, float64, bool);

f46ddc0e

Fix beam search bug (#29824) (#30140) · b2ca2cad

由 Jiaqi Liu 提交于 1月 08, 2021

* fix beam search bug

* add dygraph unittest

* update dynamic_decode argument doc

* add warning info for state which has no lengths attribute

b2ca2cad

[Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35

由 Chen Weihang 提交于 1月 08, 2021

* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment

0e3a1d35

【2.0API CherryPick】LookAhead, ModelAverage, IndexSelect (#30205) · 3ce4d34d

由 123malin 提交于 1月 08, 2021

* Add Lookahead and ModelAverage Optimizer (#30004)

* test=develop, add model_average and lookahead

* Improve Index select cuda kernel (#30139)

* test=develop, add index_select_cuda kernel

3ce4d34d

L

fix paddle.pow doc, test=document_fix (#30159) (#30213) · 8d3648c8
由 LutaoChu 提交于 1月 08, 2021

8d3648c8
C
fix syncbn convert (#30158) (#30176) · 030d678c
由 ceci3 提交于 1月 08, 2021
```
* fix syncbn convet

* add unittest
```
030d678c

[Cherry-pick] Simplify the options of spawn based on fleetrun (#30144) (#30197) · 39204d56

由 Chen Weihang 提交于 1月 07, 2021

* Simplify the options of spawn based on fleetrun (#30144)

* Simplify the options of spawn based on fleetrun

* polish details

* polish doc details

* cleanup enum test=develop (#29294)
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>

39204d56

07 1月, 2021 8 次提交

[cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613

由 WeiXin 提交于 1月 07, 2021

* Support storage of large parameters (#29988)

* Support storage of large parameters

* Reduce the complexity of the unittest

* Reduce the complexity of the unittest,commented out unittest for

* add unittest for static.save/load

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Extend the timeout for the (#30151)

bfb6f613

S

fix error message (#30135) (#30182) · 9f02c284
由 ShenLiang 提交于 1月 07, 2021

9f02c284

[cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad

由 Leo Chen 提交于 1月 07, 2021

* Improve performance of elementwise_add grad op (#29187)

* pass stop_gradient for cast op

* improve performance of elementwise_add grad

* use tensor copy async

* dygraph branch

* fix dygraph branch

* add ut

* make gelu fp16 computing more robust (#29484)

* Add fast path for dropout when p == 0  (#29553)

* add fast path for p == 0 in dropout

* add ut

07f68fad

[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63

由 furnace 提交于 1月 07, 2021

* Layer norm fp16 (#29169)

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* fix layer_norm accuracy (#29434)

* Layernorm opt (#29522)

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

* Fix compile problem when cuda_arch < 6000 (#29576)

* fix compile problem when cuda_arch < 6000

* refine code

* refine code
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

44b81e63

S

add inference api： DisableTensorRtOps (#30109) (#30178) · cb71fea0
由 Shang Zhizhou 提交于 1月 07, 2021

cb71fea0
T
pre padding in dygraph (#30179) · a2b0357d
由 tangwei12 提交于 1月 07, 2021
```
Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
```
a2b0357d
L

fix xpu pe sync, test=notest (#30095) (#30114) · 85545bbc
由 liuyuhui 提交于 1月 07, 2021

85545bbc

Cherry pick bn (#30136) · 157ff094

由 ceci3 提交于 1月 07, 2021

* fix bn docs (#30096)

* add attribute for batch_norm (#29950)

* add attribute for batch_norm

157ff094

06 1月, 2021 5 次提交

support dygraph in xpu place (#30051) (#30112) · 285f33e5

由 hong 提交于 1月 06, 2021

* support dygraph in xpu place; test=develop

* fix cpu/gpu compile error; test=develop

* fix compile error; test=develop

* fix xpu compile error; testd=develop

285f33e5

G
Cherrypick 30071 (#30074) · 19bec2fe
由 gongweibao 提交于 1月 06, 2021
```
* fix log test=release/2.0

* fix ut test=develop
```
19bec2fe

[Cherry-pick]cherry-pick to Release/2.0 (#30076) · 1ad7fcbf

由 huangxu96 提交于 1月 06, 2021

* add fp16 check into max and avg pool (#29479)

* Add ReserveSpace in dygraph batch_norm. (#29221)

* Add ReserveSpace in dygraph batch_norm.

* Add unittest for reservespace

* add float16 into adaptive_avg_pool2d check list. (#29547)

1ad7fcbf

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for... · 743649b5

由 liym27 提交于 1月 06, 2021

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105)

Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor.
But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape.

This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor.
This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

743649b5

L
[Cherry-pick 2.0] Migrate 4 APIs about array to paddle.tensor.* (#29565) (#30101) · 52caf787
由 liym27 提交于 1月 06, 2021
```
4 APIs: array_length, array_read, array_write, create_array，cherry-pick #29565
```
52caf787

05 1月, 2021 3 次提交

T
add topo-aware in heter-ps (#30087) (#30117) · 7fc2ce50
由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
7fc2ce50

[Cherry-pick 2.0] cherry pick 3 PRs about Dynamic-to-Static (#30100) · faeee3c3

由 liym27 提交于 1月 05, 2021

* [cherry-pick 2.0] Fix unitest test_slice (#29740)

Before this commit, test_slice use old api `dygraph_to_static_func` to use Dynamic-t-Static and use Executor explicitly，which is not recommended to users.
After fixed, use recommended API `paddle.jit.to_static` to replace `dygraph_to_static_func`, which won't trigger the random exception on coverage CI.

* [cherry-pick 2.0][Dy2Stat] Support grammar: for ele in var[idx] (#29541)

Support to transformfor ele in var stms in which var is a slice of Tensor.

* [cherry-pick 2.0][Dy2Stat] Fix bug for loop: a variable is used and created in loop, but used before created (#29769)

faeee3c3

[cherry-pick 2.0] Support dygraph quant model and avoid the scale to be infinity (#30098) · 3fe71d0a

由 cc 提交于 1月 05, 2021

* fix ininite scale values (#29386)

* Support dygraph quant model (#29927)

* Avoid the scale to be infinity in quant2_int8_mkldnn_pass, test=develop
* support quantized model for paddle2.0 dygraph, test=develop
Co-authored-by: NWojciech Uss <wojciech.uss@intel.com>

3fe71d0a

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致