提交 · 436144e965f7c2533cfa16938501f761e5d7b808 · BaiXuePrincess / Paddle

19 1月, 2021 9 次提交
- W
  
  fix adamw lr_to_coeff is fixed when dygraph (#30526) (#30559) · 436144e9
  由 WangXi 提交于 1月 19, 2021
  
  436144e9
- W
  [cherry pick]修复save/load相关的两个bug (#30543) · 832032c2
  由 WeiXin 提交于 1月 19, 2021
```
原始PR：#30485，#30507
```
  832032c2
- L
  [cherry-pick] support layer_norm fp16 in dygraph amp (#30430) #30566 · 0ea41e62
  由 Leo Chen 提交于 1月 19, 2021
```
[cherry-pick] support layer_norm fp16 in dygraph amp (#30430)
```
  0ea41e62
- W
  [cherry pick]perfect 'var_list' of static.load/fluid.load (#30457) (#30479) · 5844dfe4
  由 WeiXin 提交于 1月 19, 2021
```
完善static.load的var_list参数。
当加载的是多个小文件时，Tensor列表可以是所有加载文件中Tensor的子集。
原始PR：#30457
```
  5844dfe4
- L
  [Cherry-Pick] Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30564) · f15bed11
  由 liym27 提交于 1月 19, 2021
```
cherry-pick #30536
```
  f15bed11
- Z
  [2.0 API] device guard (#30307) (#30562) · 46322911
  由 Zhang Ting 提交于 1月 19, 2021
```
* add 2.0 API: device_guard
```
  46322911
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) (#30546) · 6f563ace
  由 hutuxian 提交于 1月 19, 2021
  
  6f563ace
- T
  Pd2.0 (#30532) · 1323e5e7
  由 taixiurong 提交于 1月 19, 2021
```
* support transformer v2.0

* fix range op crash in dygraph xpu place
```
  1323e5e7
- J
  
  Recompute Offload: fixed bug in memcpy (#30484) (#30517) · 7a4ccf59
  由 JZ-LIANG 提交于 1月 19, 2021
  
  7a4ccf59
18 1月, 2021 5 次提交

[cherry-pick]Modify the calculation logic of LambOptimizer (#29313) (#30510) · b3fa899b

由 guofei 提交于 1月 18, 2021

* Modify the calculation logic of LambOptimizer (#29313)

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

b3fa899b

C
[cherry-pick] add pad and concat double grad #29549 (#30432) · 5e4d54a1
由 ceci3 提交于 1月 18, 2021
```
* add pad and concat double grad

* resolve conflict
```
5e4d54a1

[cherry-pick] improve perfomance of cast and tril op (#30498) · de003cee

由 Zhang Ting 提交于 1月 18, 2021

* add fp16 support for tril_triu op (#30186)

* add VecCastCUDAKernel (#30296)
Co-authored-by: Nfurnace <34057289+windstamp@users.noreply.github.com>

de003cee

1
test=develop, fix fleet.metric (#30438) (#30473) · 2c3799d1
由 123malin 提交于 1月 18, 2021
```
* test=develop, fix fleet.metrics(mse, rmse, mae)
```
2c3799d1

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea

由 pangyoki 提交于 1月 18, 2021

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

27c2f1ea

15 1月, 2021 4 次提交

Cherry pick 30072 (#30499) · 590e718b

由 pangyoki 提交于 1月 15, 2021

* Cherry-pick 30072, add dispenable input for core.ops.reshape2/expand/slice (#30072)

* add dispenable input 'shape' for core.ops.reshape2

* add dispenable inputs for core.ops.reshape2/expand/slice

* add ut

* save reshape update in pr 30180

* save reshape update v2 in pr 30180
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>

590e718b

add transpose double grad , cherry-pick from #29600 (#30435) · badc6f22

由 lijianshe02 提交于 1月 15, 2021

* add transpose double grad test=develop (#29600)

* add transpose double grad test=develop

* cherry-pick test=develop

badc6f22

W

Support double backward rsqrt (#29589) (#30431) · 71ab8ae9
由 whs 提交于 1月 15, 2021

71ab8ae9

【Cherry-Pick】add distributed_infer (#30300) (#30427) · ae75affd

由 123malin 提交于 1月 15, 2021

* test=develop, add distributed_infer (#30300)

* test=develop, add distributed_infer

* test=develop, fix unittest cmakefile conflict

* test=develop, fix test_dist_fleet_base

ae75affd

14 1月, 2021 4 次提交

C
[Cherry-pick] Fix prune input bug of jit.save #30425 · 2cdc36f4
由 Chen Weihang 提交于 1月 14, 2021
```
[Cherry-pick] Fix prune input bug of jit.save

cheryy-pick of #30384
```
2cdc36f4

optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2

由 QingshuChen 提交于 1月 14, 2021

* optimize memcpy perf for kunlun (#30291)

* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor

* fix bug that cann't find mkldnn(kunlun) (#30394)

9de42be2

[cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a

由 LielinJiang 提交于 1月 14, 2021

* Add double grad for conv_transpose (#29706)

* add double grad for conv_transpose

* register cudnn conv double grad for depthwise conv (#29807)

1552343a

fix bug of celoss when using ignore_index and reduction (#30395) · c22ee575

由 chajchaj 提交于 1月 14, 2021

* fix bug of celoss when using ignore_index and reduction (#30180)

* fix bug of using ignore_index and reduction,test=develop

* fix bug of celoss when using ignore_index and reduction, test=develop

* improve performance when ignore_index=-100, test=develop

* add test in test_cross_entropy_loss.py for coverage rate, test=develop

* rm comment in test_cross_entropy_loss.py, test=develop

* del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop

* change mask to a more simplified implementation, test=develop

* del comment in python/paddle/nn/functional/loss.py, test=develop

* del hard code and change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* fix bug of celoss when using ignore_index and reduction (#30180)

* fix bug of using ignore_index and reduction,test=develop

* fix bug of celoss when using ignore_index and reduction, test=develop

* improve performance when ignore_index=-100, test=develop

* add test in test_cross_entropy_loss.py for coverage rate, test=develop

* rm comment in test_cross_entropy_loss.py, test=develop

* del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop

* change mask to a more simplified implementation, test=develop

* del comment in python/paddle/nn/functional/loss.py, test=develop

* del hard code and change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

c22ee575

13 1月, 2021 2 次提交
- J
  
  Recompute Offload (#30233) (#30372) · 3fbc3cf4
  由 JZ-LIANG 提交于 1月 13, 2021
  
  3fbc3cf4
- S
  
  Support unused parameters in dynamic graph distributed (#30224) (#30374) · 020e2431
  由 ShenLiang 提交于 1月 13, 2021
  
  020e2431
12 1月, 2021 6 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

[2.0 Cherry-pick]fix 2.0 error message (#30332) · df67b317

由 swtkiwi 提交于 1月 12, 2021

* fix datanorm error msg (#30294)

* Optimize the error message of framework. (#30134)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* fix enforce msg of sum xpu op (#30113)

* enhance error info for py_func (#30138)

* enhance error info for py_func

* update

* fix elugradgrad test fail & error message opt (#30171)

* fix elugradgrad test fail and error message opt

* fix unitest,test=develop

* Update prroi_pool_op.h

fix error message

* opt message,test=develop

* fix ci fail,test=develop

* Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)

Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc

* enhance error message, test=develop (#30220)

* fix error message for distribute_fpn_proposals_op (#30116)

* enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* enhance error message of nll_loss op test=develop (#30125)

* enhance error message of nll_loss op test=develop
Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
Co-authored-by: Nxiemoyuan <71377852+xiemoyuan@users.noreply.github.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NWilber <jiweibo@baidu.com>
Co-authored-by: NDouble_V <liuvv0203@163.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: Nzhang wenhui <frankwhzhang@126.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Nlijianshe02 <48898730+lijianshe02@users.noreply.github.com>

df67b317

C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data... · 2db79f0a

由 Zhen Wang 提交于 1月 12, 2021

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data type in static mode.(#29890) (#30313)

* Fix the accuracy problem of allclose op when using float64 data type in static mode.

* Format the code style.

2db79f0a

[Cherry-pick] Complex grad for matmul, kron and type promotion (#30304) · 7346edc2

由 chentianyu03 提交于 1月 12, 2021

* complex gradient matmul  (#29966)

* dot op support complex types

* matmul support complex types

* add test case

* matmul broadcast gradient support complex

* move conjFunctor to complex_functor.h

* change the kron gradient when complex types (#29995)

* type promotion for grad (#30177)

* type promotion for grad

* add type promotion for div op

7346edc2

11 1月, 2021 10 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

[cherry pick] Fix bug for 'save mutiple method' (#30218) (#30278) · d9c70217

由 WeiXin 提交于 1月 11, 2021

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

d9c70217

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65

由 pangyoki 提交于 1月 11, 2021

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

7c943a65

[Cherry-pick] Add Static Variable Clone (#30208) #30270 · 6dd70b9b

由 Huihuang Zheng 提交于 1月 11, 2021

Cherry-pick of PR #30208 , this PR added clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat where users called clone of dygraph Tensor.

6dd70b9b

W
[cherry-pick]add support for place string representation #30264 · fb66355e
由 wangchaochaohu 提交于 1月 11, 2021
```
cherry-pick #28769, add support for place string representation 
```
fb66355e

[cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8

由 wangchaochaohu 提交于 1月 11, 2021

* elementwise_add_grad Op optimization  (#29575)

* optimize for long width for elementwise (#29602)

* refine (#29622)

* delete the code for fp16 optimization because it is not faster than common template code (#29715)

* fix the shape choose of vectorize for cuda

* optimization for fp16 elementwise add (#29744)

* Fix the compiler error for half type (#29799)

* refine the compiler error for half2 operation (#29816)

* fix the compiler error when gcc4 cuda9.0 (#29997)

e59524f8

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

A
Skip convert tensor shape while using Paddle.shape (#30223) (#30239) · 55604248
由 Aurelius84 提交于 1月 11, 2021
```
* fix tensor shape bug

* fix op_num

* clean code
```
55604248

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

C
[Cherry-pick] remove distributed prepare context (#30219) (#30256) · 1fa98c5d
由 Chen Weihang 提交于 1月 10, 2021
```
att, cherry-pick of #30219
```
1fa98c5d

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致