提交 · a44b65dee0320f4969ceaa8d0a93cb9bb4806e7d · BaiXuePrincess / Paddle

03 2月, 2021 1 次提交

[cherry-pick] Update gather_tree (#30784) · a44b65de

由 liu zhengxi 提交于 2月 03, 2021

* upgrade gather_tree to core.ops (#30697)

* upgrade gather_tree to core.ops

* update gather_tree unittests

* update gather_tree doc (#30693)

* update gather_tree doc, test=document_fix

* update sample code, test=document_fix

* remove tensor type, test=document_fix

a44b65de

02 2月, 2021 1 次提交
- X
  Optimize the encoder of Transformer. (#30439) (#30813) · 63605387
  由 xiemoyuan 提交于 2月 02, 2021
```
* Add cache for Transformer encoder.

* Bug fixed.

* add unittests for transformer encoder.
```
  63605387
20 1月, 2021 5 次提交
- A
  [cherry-pick]Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) (#30612) · fd9d6fda
  由 AshburnLee 提交于 1月 20, 2021
```
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)

* Fixed an error

* Fixed an error
```
  fd9d6fda
- Z
  [Cherry-pic]Fix the bug in fleet amp_init. (#30606) (#30608) · 09aed38d
  由 Zhen Wang 提交于 1月 20, 2021
```
* Fix the bug in fleet amp_init.

* Fix the amp_init unit test.
```
  09aed38d
- A
  Add tf32 switch for cuDNN (#29192) (#30574) · 138a71b7
  由 AshburnLee 提交于 1月 20, 2021
```
This PR is cherry-picked from PR: #29192
Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
```
  138a71b7
- A
  [Dy2static]Fix paddle prefix in is_paddle_api (#30569) (#30594) · 12c51f57
  由 Aurelius84 提交于 1月 20, 2021
```
[Dy2static]Fix paddle prefix in is_paddle_api (#30569)
cherry-pick #30569
```
  12c51f57
- H
  [cherry pick]Add pure fp16 amp_init for fleet API. (#30592) · 3317cf01
  由 huangxu96 提交于 1月 20, 2021
```
* add fleet amp.init()

* add unittest for fleet_amp_init
```
  3317cf01
19 1月, 2021 9 次提交
- W
  
  fix adamw lr_to_coeff is fixed when dygraph (#30526) (#30559) · 436144e9
  由 WangXi 提交于 1月 19, 2021
  
  436144e9
- W
  [cherry pick]修复save/load相关的两个bug (#30543) · 832032c2
  由 WeiXin 提交于 1月 19, 2021
```
原始PR：#30485，#30507
```
  832032c2
- L
  [cherry-pick] support layer_norm fp16 in dygraph amp (#30430) #30566 · 0ea41e62
  由 Leo Chen 提交于 1月 19, 2021
```
[cherry-pick] support layer_norm fp16 in dygraph amp (#30430)
```
  0ea41e62
- W
  [cherry pick]perfect 'var_list' of static.load/fluid.load (#30457) (#30479) · 5844dfe4
  由 WeiXin 提交于 1月 19, 2021
```
完善static.load的var_list参数。
当加载的是多个小文件时，Tensor列表可以是所有加载文件中Tensor的子集。
原始PR：#30457
```
  5844dfe4
- L
  [Cherry-Pick] Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30564) · f15bed11
  由 liym27 提交于 1月 19, 2021
```
cherry-pick #30536
```
  f15bed11
- Z
  [2.0 API] device guard (#30307) (#30562) · 46322911
  由 Zhang Ting 提交于 1月 19, 2021
```
* add 2.0 API: device_guard
```
  46322911
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) (#30546) · 6f563ace
  由 hutuxian 提交于 1月 19, 2021
  
  6f563ace
- T
  Pd2.0 (#30532) · 1323e5e7
  由 taixiurong 提交于 1月 19, 2021
```
* support transformer v2.0

* fix range op crash in dygraph xpu place
```
  1323e5e7
- J
  
  Recompute Offload: fixed bug in memcpy (#30484) (#30517) · 7a4ccf59
  由 JZ-LIANG 提交于 1月 19, 2021
  
  7a4ccf59
18 1月, 2021 5 次提交

[cherry-pick]Modify the calculation logic of LambOptimizer (#29313) (#30510) · b3fa899b

由 guofei 提交于 1月 18, 2021

* Modify the calculation logic of LambOptimizer (#29313)

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

* Modify the calculation logic of LambOptimizer

b3fa899b

C
[cherry-pick] add pad and concat double grad #29549 (#30432) · 5e4d54a1
由 ceci3 提交于 1月 18, 2021
```
* add pad and concat double grad

* resolve conflict
```
5e4d54a1

[cherry-pick] improve perfomance of cast and tril op (#30498) · de003cee

由 Zhang Ting 提交于 1月 18, 2021

* add fp16 support for tril_triu op (#30186)

* add VecCastCUDAKernel (#30296)
Co-authored-by: Nfurnace <34057289+windstamp@users.noreply.github.com>

de003cee

1
test=develop, fix fleet.metric (#30438) (#30473) · 2c3799d1
由 123malin 提交于 1月 18, 2021
```
* test=develop, fix fleet.metrics(mse, rmse, mae)
```
2c3799d1

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea

由 pangyoki 提交于 1月 18, 2021

Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* fix test_cross_entropy_loss error because of reshape2

* add inplace strategy

* add elementwise_add sub

* let backward op not use inplace

* grad op do not use inplace

* fix memory increase error and add leaf error message

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

* add unittest and leaf error message

* merge view error

* optimize op_function_generator format and support sum inplace op

* fix format of basic_engine

* fix format for framework

* little change of variable wrapper

* add reshape, squeeze, unsqueeze, scatter api

* add relu elu tanh softmax inplace api

* fix test_squeeze_op unittest

* fix test_relu_op unittest

* fix comment problems

* delete sample code of inplace api

* add reference of grad_pending_nodes in basic_engine

* fix unittest name

* add inplace apis into wlist

* fix error message

* add PADDLE_ENFORCE for set grad op twice

* fix head file error

27c2f1ea

15 1月, 2021 4 次提交

Cherry pick 30072 (#30499) · 590e718b

由 pangyoki 提交于 1月 15, 2021

* Cherry-pick 30072, add dispenable input for core.ops.reshape2/expand/slice (#30072)

* add dispenable input 'shape' for core.ops.reshape2

* add dispenable inputs for core.ops.reshape2/expand/slice

* add ut

* save reshape update in pr 30180

* save reshape update v2 in pr 30180
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>

590e718b

add transpose double grad , cherry-pick from #29600 (#30435) · badc6f22

由 lijianshe02 提交于 1月 15, 2021

* add transpose double grad test=develop (#29600)

* add transpose double grad test=develop

* cherry-pick test=develop

badc6f22

W

Support double backward rsqrt (#29589) (#30431) · 71ab8ae9
由 whs 提交于 1月 15, 2021

71ab8ae9

【Cherry-Pick】add distributed_infer (#30300) (#30427) · ae75affd

由 123malin 提交于 1月 15, 2021

* test=develop, add distributed_infer (#30300)

* test=develop, add distributed_infer

* test=develop, fix unittest cmakefile conflict

* test=develop, fix test_dist_fleet_base

ae75affd

14 1月, 2021 4 次提交

C
[Cherry-pick] Fix prune input bug of jit.save #30425 · 2cdc36f4
由 Chen Weihang 提交于 1月 14, 2021
```
[Cherry-pick] Fix prune input bug of jit.save

cheryy-pick of #30384
```
2cdc36f4

optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2

由 QingshuChen 提交于 1月 14, 2021

* optimize memcpy perf for kunlun (#30291)

* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor

* fix bug that cann't find mkldnn(kunlun) (#30394)

9de42be2

[cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a

由 LielinJiang 提交于 1月 14, 2021

* Add double grad for conv_transpose (#29706)

* add double grad for conv_transpose

* register cudnn conv double grad for depthwise conv (#29807)

1552343a

fix bug of celoss when using ignore_index and reduction (#30395) · c22ee575

由 chajchaj 提交于 1月 14, 2021

* fix bug of celoss when using ignore_index and reduction (#30180)

* fix bug of using ignore_index and reduction,test=develop

* fix bug of celoss when using ignore_index and reduction, test=develop

* improve performance when ignore_index=-100, test=develop

* add test in test_cross_entropy_loss.py for coverage rate, test=develop

* rm comment in test_cross_entropy_loss.py, test=develop

* del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop

* change mask to a more simplified implementation, test=develop

* del comment in python/paddle/nn/functional/loss.py, test=develop

* del hard code and change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* fix bug of celoss when using ignore_index and reduction (#30180)

* fix bug of using ignore_index and reduction,test=develop

* fix bug of celoss when using ignore_index and reduction, test=develop

* improve performance when ignore_index=-100, test=develop

* add test in test_cross_entropy_loss.py for coverage rate, test=develop

* rm comment in test_cross_entropy_loss.py, test=develop

* del  hard code of "float64" in python/paddle/nn/functional/loss.py, test=develop

* change mask to a more simplified implementation, test=develop

* del comment in python/paddle/nn/functional/loss.py, test=develop

* del hard code and change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

* change mask to a more simplified implementation, test=develop

c22ee575

13 1月, 2021 2 次提交
- J
  
  Recompute Offload (#30233) (#30372) · 3fbc3cf4
  由 JZ-LIANG 提交于 1月 13, 2021
  
  3fbc3cf4
- S
  
  Support unused parameters in dynamic graph distributed (#30224) (#30374) · 020e2431
  由 ShenLiang 提交于 1月 13, 2021
  
  020e2431
12 1月, 2021 6 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

[2.0 Cherry-pick]fix 2.0 error message (#30332) · df67b317

由 swtkiwi 提交于 1月 12, 2021

* fix datanorm error msg (#30294)

* Optimize the error message of framework. (#30134)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* fix enforce msg of sum xpu op (#30113)

* enhance error info for py_func (#30138)

* enhance error info for py_func

* update

* fix elugradgrad test fail & error message opt (#30171)

* fix elugradgrad test fail and error message opt

* fix unitest,test=develop

* Update prroi_pool_op.h

fix error message

* opt message,test=develop

* fix ci fail,test=develop

* Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)

Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc

* enhance error message, test=develop (#30220)

* fix error message for distribute_fpn_proposals_op (#30116)

* enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* enhance error message of nll_loss op test=develop (#30125)

* enhance error message of nll_loss op test=develop
Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
Co-authored-by: Nxiemoyuan <71377852+xiemoyuan@users.noreply.github.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NWilber <jiweibo@baidu.com>
Co-authored-by: NDouble_V <liuvv0203@163.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: Nzhang wenhui <frankwhzhang@126.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Nlijianshe02 <48898730+lijianshe02@users.noreply.github.com>

df67b317

C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data... · 2db79f0a

由 Zhen Wang 提交于 1月 12, 2021

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data type in static mode.(#29890) (#30313)

* Fix the accuracy problem of allclose op when using float64 data type in static mode.

* Format the code style.

2db79f0a

[Cherry-pick] Complex grad for matmul, kron and type promotion (#30304) · 7346edc2

由 chentianyu03 提交于 1月 12, 2021

* complex gradient matmul  (#29966)

* dot op support complex types

* matmul support complex types

* add test case

* matmul broadcast gradient support complex

* move conjFunctor to complex_functor.h

* change the kron gradient when complex types (#29995)

* type promotion for grad (#30177)

* type promotion for grad

* add type promotion for div op

7346edc2

11 1月, 2021 3 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

[cherry pick] Fix bug for 'save mutiple method' (#30218) (#30278) · d9c70217

由 WeiXin 提交于 1月 11, 2021

* Fix bug for 'save mutiple method'

* To pass coverage.

* edit code to pass coverage.

* edit code to pass coverage.

* add unittest for coverage.

* change for coverage.

* edit for coverage.

d9c70217

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65

由 pangyoki 提交于 1月 11, 2021

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

7c943a65

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致