提交 · 2db79f0a08f9fafa8e71124ca1e22711953a8438 · PaddlePaddle / Paddle

12 1月, 2021 3 次提交

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data... · 2db79f0a

由 Zhen Wang 提交于 1月 12, 2021

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data type in static mode.(#29890) (#30313)

* Fix the accuracy problem of allclose op when using float64 data type in static mode.

* Format the code style.

2db79f0a

[Cherry-pick] Complex grad for matmul, kron and type promotion (#30304) · 7346edc2

由 chentianyu03 提交于 1月 12, 2021

* complex gradient matmul  (#29966)

* dot op support complex types

* matmul support complex types

* add test case

* matmul broadcast gradient support complex

* move conjFunctor to complex_functor.h

* change the kron gradient when complex types (#29995)

* type promotion for grad (#30177)

* type promotion for grad

* add type promotion for div op

7346edc2

L
Delete incorrect warning message (#30196) (#30262) · 501b11de
由 LielinJiang 提交于 1月 12, 2021
```
* fix warning and no grad
```
501b11de

11 1月, 2021 20 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

L
[cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
由 Leo Chen 提交于 1月 11, 2021
```
[cherry-pick] Async drop scope in executor (#29714)
```
93ce7f69
L
[Cherry-Pick 2.0] Check the rank of input in kernel of set_value op (#30147) (#30301) · a2bbd06a
由 liym27 提交于 1月 11, 2021
```
cherry-pick #30147，For op set_value, check input's rank < 7
```
a2bbd06a

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c

由 WeiXin 提交于 1月 11, 2021

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)

为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
原始PR：#30161

04cc659c

[cherry pick] Fix bug for 'save mutiple method' (#30218) (#30278) · d9c70217

由 WeiXin 提交于 1月 11, 2021

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

d9c70217

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65

由 pangyoki 提交于 1月 11, 2021

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

7c943a65

Z
[cherry-pick]add cast cuda kernel (#29352) #30263 · afbc6367
由 Zhang Ting 提交于 1月 11, 2021
```
 add cast cuda kernel

cherry-pick #29352
```
afbc6367

[Cherry-pick] Add Static Variable Clone (#30208) #30270 · 6dd70b9b

由 Huihuang Zheng 提交于 1月 11, 2021

Cherry-pick of PR #30208 , this PR added clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat where users called clone of dygraph Tensor.

6dd70b9b

W
[cherry-pick]add support for place string representation #30264 · fb66355e
由 wangchaochaohu 提交于 1月 11, 2021
```
cherry-pick #28769, add support for place string representation 
```
fb66355e

[cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8

由 wangchaochaohu 提交于 1月 11, 2021

* elementwise_add_grad Op optimization  (#29575)

* optimize for long width for elementwise (#29602)

* refine (#29622)

* delete the code for fp16 optimization because it is not faster than common template code (#29715)

* fix the shape choose of vectorize for cuda

* optimization for fp16 elementwise add (#29744)

* Fix the compiler error for half type (#29799)

* refine the compiler error for half2 operation (#29816)

* fix the compiler error when gcc4 cuda9.0 (#29997)

e59524f8

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[Cherry pick] improve dropout (#30260) · b4931ab1

由 Zhang Ting 提交于 1月 11, 2021

* improve dropout (#29465)

* improve drop out

* add VectorizedRandomGeneratorWithGenerator

* fix bug

* modify according to comments

* improve dropout grad (#29605)

* improve grad perf

* fix the bug of dropout_grad (#29813)

b4931ab1

[cherry-pick] softmax optimize (#30279) · b80beb16

由 GaoWei8 提交于 1月 11, 2021

* Softmax vectorization (#29404)

* vec softmax fw

* vec softmax bw

* add a message argument for compiler compatibility

* optimize softmax forward (#30217)

* optimize softmax forward
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

b80beb16

W

Cherry-pick 30194 30164 30201(#30202) · 36de178a
由 Wilber 提交于 1月 11, 2021

36de178a
A
Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
由 Aurelius84 提交于 1月 11, 2021
```
* fix tensor shape bug

* fix op_num

* clean code
```
55604248
G
Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
由 guofei 提交于 1月 11, 2021
```
* Quantization supports 2.0 APIs

* Fix the error of save_quantized_model
```
393a91f1

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

C
[Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
由 Chen Weihang 提交于 1月 10, 2021
```
att, cherry-pick of #30219
```
1fa98c5d
X
[cherry-pick] clean redundant API alias in 2.0 - part 2 (#30244) · 70cbde83
由 XiaoguangHu 提交于 1月 10, 2021
```
* fix dynamic to static error

* delete paddle.nn.functional.assign
```
70cbde83

add aarch64 and sunway kunlun lib (#30027) (#30237) · eacbd488

由 QingshuChen 提交于 1月 11, 2021

* add aarch64 and sunway kunlun lib

* minor

* optimize elementwise_add for kunlun

* update kunlun dependence

* minor

* minor

eacbd488

10 1月, 2021 1 次提交
- W
  
  fix adamw apply gradient (#30130) (#30207) · c4cd99f3
  由 WangXi 提交于 1月 10, 2021
  
  c4cd99f3
09 1月, 2021 1 次提交
- L
  
  fix pad (#30231) · 6d1fb79d
  由 littletomatodonkey 提交于 1月 09, 2021
  
  6d1fb79d
08 1月, 2021 11 次提交

[Cherry-Pick 2.0] In creation.assgin, reuse implamention code of... · 8e788e27

由 liym27 提交于 1月 08, 2021

[Cherry-Pick 2.0] In creation.assgin, reuse implamention code of layers.tensor.assign to avoid maintain two code (#30227) (#30236)

cherry-pick #30227

8e788e27

[cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not... · 2ba9bdd7

由 liym27 提交于 1月 08, 2021

[cherry-pick] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive #29965 (#30235)

* [Cherry-Pick 2.0] [Dy2Stat] Don't convert to paddle.shape if var_x.shape is not negetive (#29965)

1. When x is Variable, call nn.shape(x) only in following cases:
 1）The shape of x is used in control flow condition.
 2）The dim to be used is negetive
2. When x is Variable, but x.shape or x.shape[idx] doesn't contain negetive value, don't convert to paddle.shape()

* [Cherry-Pick 2.0] [Dy2Stat] Use Paddle2.0 api paddle.tensor.array_* (#30156)

2ba9bdd7

[Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4

由 huangxu96 提交于 1月 08, 2021

* Optimizer trans momentum (#29597)

* merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.

* Add unittest for 2.0  Momentum API.

* fix some bugs in weight_decay.

* add alias for fluid.contrib.mixed_precision (#29562)

* add alias for fluid.contrib.mixed_precision

* add static.amp into setup.pu.in (#29621)

* add static.amp into setup.pu.in

* add unittest for api

* fix a bug in multi_precision_fp16 unittest. (#29756)

9f7c66b4

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39

由 liym27 提交于 1月 08, 2021

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)

1. when slice_item is a slice:
 1) the start of __getitem__ should be std::max(start, 0) if slice
 2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data

5fe3da39

[Cherry-Pick 2.0][setitem] Support Tensor setitem in static mode (#29708) (#30104) · f46ddc0e

由 liym27 提交于 1月 08, 2021

1. Type of index: int, slice(step must be 1).

2. Type of value:
 (1) int32, int64, float32, bool;
 (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
 (3) paddle.Tensor(int32, int64, float32, float64, bool);

f46ddc0e

Fix beam search bug (#29824) (#30140) · b2ca2cad

由 Jiaqi Liu 提交于 1月 08, 2021

* fix beam search bug

* add dygraph unittest

* update dynamic_decode argument doc

* add warning info for state which has no lengths attribute

b2ca2cad

[Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35

由 Chen Weihang 提交于 1月 08, 2021

* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment

0e3a1d35

【2.0API CherryPick】LookAhead, ModelAverage, IndexSelect (#30205) · 3ce4d34d

由 123malin 提交于 1月 08, 2021

* Add Lookahead and ModelAverage Optimizer (#30004)

* test=develop, add model_average and lookahead

* Improve Index select cuda kernel (#30139)

* test=develop, add index_select_cuda kernel

3ce4d34d

L

fix paddle.pow doc, test=document_fix (#30159) (#30213) · 8d3648c8
由 LutaoChu 提交于 1月 08, 2021

8d3648c8
C
fix syncbn convert (#30158) (#30176) · 030d678c
由 ceci3 提交于 1月 08, 2021
```
* fix syncbn convet

* add unittest
```
030d678c

[Cherry-pick] Simplify the options of spawn based on fleetrun (#30144) (#30197) · 39204d56

由 Chen Weihang 提交于 1月 07, 2021

* Simplify the options of spawn based on fleetrun (#30144)

* Simplify the options of spawn based on fleetrun

* polish details

* polish doc details

* cleanup enum test=develop (#29294)
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>

39204d56

07 1月, 2021 4 次提交

[cherry pick] paddle.save/load ,paddle.static.save/load 保存大文件的bug (#30170) · bfb6f613

由 WeiXin 提交于 1月 07, 2021

* Support storage of large parameters (#29988)

* Support storage of large parameters

* Reduce the complexity of the unittest

* Reduce the complexity of the unittest,commented out unittest for

* add unittest for static.save/load

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Extend the timeout for the (#30151)

bfb6f613

S

fix error message (#30135) (#30182) · 9f02c284
由 ShenLiang 提交于 1月 07, 2021

9f02c284

[cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad

由 Leo Chen 提交于 1月 07, 2021

* Improve performance of elementwise_add grad op (#29187)

* pass stop_gradient for cast op

* improve performance of elementwise_add grad

* use tensor copy async

* dygraph branch

* fix dygraph branch

* add ut

* make gelu fp16 computing more robust (#29484)

* Add fast path for dropout when p == 0  (#29553)

* add fast path for p == 0 in dropout

* add ut

07f68fad

[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63

由 furnace 提交于 1月 07, 2021

* Layer norm fp16 (#29169)

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* fix layer_norm accuracy (#29434)

* Layernorm opt (#29522)

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

* Fix compile problem when cuda_arch < 6000 (#29576)

* fix compile problem when cuda_arch < 6000

* refine code

* refine code
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

44b81e63

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功