提交 · 3fbc3cf4356a205eef9ae552d4614c2ec23f5c97 · Crayon鑫 / Paddle

13 1月, 2021 3 次提交

J

Recompute Offload (#30233) (#30372) · 3fbc3cf4
由 JZ-LIANG 提交于 1月 13, 2021

3fbc3cf4
T
split ps with distributed (#30337) · a97ca56a
由 tangwei12 提交于 1月 13, 2021
```
Change-Id: I3c788e7576688e63181e7f01562529b85a09cc59
```
a97ca56a

git cherry-pick the commits of operator version registries, test=release/2.0 (#30292) · 5eab1a38

由石晓伟提交于 1月 13, 2021

* Register op version for grid_sampler, test=op_version (#29916)

* add op version for fake_quant and fake_dequant ops, test=op_version (#29923)

* Register op version for print, test=op_version (#29945)

* add gru op_register_version; test=op_version; (#29931)

* Register op version for coalesce_tensor. (#29940)

* register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)

* add op_register_version for allclose op; test=op_version (#29968)

* register ModifyAttr for instance_norm, test=op_version (#29938)

* add op_version for flip op [test=op_version] (#30019)

* add the op version check for the elementwise ops, test=op_version (#30010)

* add the support the op version check for matmul, test=op_version (#30011)

* Revert "register ModifyAttr for instance_norm, test=op_version (#29938)"

* add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034)

* Fix rank_attention op_version, test=op_version (#30006)

* fix rank_attention, test=op_version

* Register op version for linspace,test=op_version (#30025)

* fix op_register_version for compare ops, test=op_version (#30007)
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>

* register ModifyAttr for instance_norm, test=op_version (#30065)

* register instance norm, test=op_version

* add trace op_register_version and fix version bug; test=op_version (#30000)

* fix a bug in op_version_registry, test=develop, test=op_version (#29994)

* Add version checking, test=op_version (#30129)

* fix a bug in gaussian_random_op version, test=release/2.0
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: Ncc <52520497+juncaipeng@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NGuo Sheng <whucsgs@163.com>
Co-authored-by: Nwangxinxin08 <69842442+wangxinxin08@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NFlyingQianMM <245467267@qq.com>
Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Nchalsliu <45041955+chalsliu@users.noreply.github.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: NShenLiang <shenliang03@baidu.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: Nchannings <chenlingchi@baidu.com>
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
Co-authored-by: Nruri <shipeng1108@163.com>

5eab1a38

12 1月, 2021 6 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

【Cherry-Pick】Fix device_context & Save Tensor & Gloo (#30336) · 284bae99

由 Chengmo 提交于 1月 12, 2021

* Fix server.h include device_context (#30243)

* fix cmake
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* 【Paddle.Fleet】Support local save sparse param (#30175)

* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

284bae99

[2.0 Cherry-pick]fix 2.0 error message (#30332) · df67b317

由 swtkiwi 提交于 1月 12, 2021

* fix datanorm error msg (#30294)

* Optimize the error message of framework. (#30134)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* fix enforce msg of sum xpu op (#30113)

* enhance error info for py_func (#30138)

* enhance error info for py_func

* update

* fix elugradgrad test fail & error message opt (#30171)

* fix elugradgrad test fail and error message opt

* fix unitest,test=develop

* Update prroi_pool_op.h

fix error message

* opt message,test=develop

* fix ci fail,test=develop

* Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)

Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc

* enhance error message, test=develop (#30220)

* fix error message for distribute_fpn_proposals_op (#30116)

* enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* enhance error message of nll_loss op test=develop (#30125)

* enhance error message of nll_loss op test=develop
Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
Co-authored-by: Nxiemoyuan <71377852+xiemoyuan@users.noreply.github.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NWilber <jiweibo@baidu.com>
Co-authored-by: NDouble_V <liuvv0203@163.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: Nzhang wenhui <frankwhzhang@126.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Nlijianshe02 <48898730+lijianshe02@users.noreply.github.com>

df67b317

L
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199) #30286 · e7cbc43f
由 Leo Chen 提交于 1月 12, 2021
```
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199)
```
e7cbc43f
C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

11 1月, 2021 4 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

L
[cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
由 Leo Chen 提交于 1月 11, 2021
```
[cherry-pick] Async drop scope in executor (#29714)
```
93ce7f69

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

08 1月, 2021 1 次提交

[Cherry-pick] [Complex] Simplify prepared op impl to improve performance (#30153) (#30215) · 0e3a1d35

由 Chen Weihang 提交于 1月 08, 2021

* simplify prepared op impl to improve performance

* fix kunlun compile error

* continue fix kunlun compile error

* only transform diff place when dtype diff

* fix failed unittests

* remove useless file

* polish impl by review comment

0e3a1d35

07 1月, 2021 1 次提交
- L
  
  fix xpu pe sync, test=notest (#30095) (#30114) · 85545bbc
  由 liuyuhui 提交于 1月 07, 2021
  
  85545bbc
06 1月, 2021 1 次提交

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for... · 743649b5

由 liym27 提交于 1月 06, 2021

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105)

Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor.
But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape.

This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor.
This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

743649b5

05 1月, 2021 2 次提交
- T
  add topo-aware in heter-ps (#30087) (#30117) · 7fc2ce50
  由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
  7fc2ce50
- C
  
  [cherry-pick] Add mkldnn interpolate op, support manual enable mkldnn interpolate op (#30083) · 9a6926f5
  由 cc 提交于 1月 05, 2021
  
  9a6926f5
04 1月, 2021 1 次提交
- S
  
  fix op version checker of pass bug (#30028) (#30084) · 477b0c46
  由 Shang Zhizhou 提交于 1月 04, 2021
  
  477b0c46
31 12月, 2020 1 次提交

[Cherry-pick] Disable gloo by default #29559 #29805 (#29601) · 640f8cf0

由 lilong12 提交于 12月 31, 2020

* update, test=develop (#29559)

* Disable gloo by default (#29805)

* update, test=develop

* update, test=develop

640f8cf0

29 12月, 2020 6 次提交

[Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172

由 liuyuhui 提交于 12月 29, 2020

* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)

* [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)

* add bkcl.so in whl for kunlun (#29947)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>

847aa172

[Cherry-pick] Complex network execute support (#29905) · 91ebc460

由 Chen Weihang 提交于 12月 29, 2020

* [Complex] Add support for complex grad accumulated (#29889)

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

* [Complex] Handle complex to real after type promotion (#29855)

* try to add fwd op input dtypes

* refactor base impl

* return tmp_ins after dygraph prepare data

* fix typo found in debug

* polish comment & add complex net test

* revert detail change

* fix unittest failed

* add complex kernel condition control

* fix xpu test failed & polish comment

* polish details by review comments

* Complex op test (#29753)

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* change grad elementwise_mul for complex types (#29757)

* add conj op for complex types

* add conj for complex types

* add more test case

* add conj_op test

* modify conj api and impl

* add complex type for fill_constant_op xpu

* add setConstant for complex type

* remove complex conj test file

* user define grad for test_conj_op

* add test case for static mode of conj api

* modify conj doc

* change input args name to x

* remove useless codes

* conj support real types

* add conj test case for real number

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* modify grad of mul for complex types

* fix the grads of inputs args order not match bug

* change the grad of div when complex types (#29804)

* change the grad of div when complex types

* fix the grads of inputs args order not match bug
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>

91ebc460

石

[cherry-pick] #26920 , #22924 (#29948) · bea300dd
由石晓伟提交于 12月 29, 2020

bea300dd
C

[cherry-pick] map matmul/squeeze2+matmul/reshape2+matmul to mul #29911 (#29980) · 160b3477
由 cc 提交于 12月 29, 2020

160b3477
T
cherry pick heter ps (#29955) · a839ddca
由 Thunderbrook 提交于 12月 29, 2020
```
* cherry pick heter ps

* 　CMakeList
```
a839ddca
W

[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925) (#29964) · bf653d8e
由 Wilber 提交于 12月 29, 2020

bf653d8e

25 12月, 2020 2 次提交

Q
feat: support check_nan_inf for kunlun/xpu device (#29694) (#29898) · 41917fb5
由 QingshuChen 提交于 12月 25, 2020
```
* feat: support check_nan_inf for kunlun device

* support kunlun stack

* minor
```
41917fb5

2 0 ps core 2 (#29894) · f781ab08

由 tangwei12 提交于 12月 25, 2020

* add ps table (#29463)

* add ps table

Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178

* add service (#29560)

* add service, remove ut on mac

* fix heter_profiler & add heter stop method

* fix code style

* merge pscore

Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57

* fix cmake

Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb

* fix conflit

Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba

* fix conflit

Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa

f781ab08

23 12月, 2020 1 次提交
- 石
  
  update the operator registration for incompatible upgrade, test=develop (#29720) (#29774) · 8bc0a31b
  由石晓伟提交于 12月 23, 2020
  
  8bc0a31b
22 12月, 2020 1 次提交
- W
  
  fleet sync build strategy, test=develop (#29732) (#29745) · f8888a07
  由 WangXi 提交于 12月 22, 2020
  
  f8888a07
21 12月, 2020 1 次提交
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) (#29598) · 2352a8af
  由 Jacek Czaja 提交于 12月 21, 2020
  
  2352a8af
18 12月, 2020 1 次提交

[Cherry-pick] Add complex api conj, real and imag (#29750) · ab5cc042

由 Chen Weihang 提交于 12月 18, 2020

* Add complex dtype op (add) test example (#29603)


* add op test case for complex

* polish code details

* add xpu set constant support

* fix argument rror

* remove useless pyc file

* [Complex] Add real & imag op and api for complex tensor (#29672)

* add complex real op & api & unittest

* add imag op & api & unittest

* refactor op impl

* revert simplify writing due to complile failed

* polish details

* polish grad op code

* add conj op for complex types (#29527)

* add conj op for complex types

* add conj for complex types

* add more test case

* add conj_op test

* modify conj api and impl

* add complex type for fill_constant_op xpu

* add setConstant for complex type

* remove complex conj test file

* user define grad for test_conj_op

* add test case for static mode of conj api

* modify conj doc

* change input args name to x

* remove useless codes

* conj support real types

* add conj test case for real number
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>

ab5cc042

16 12月, 2020 1 次提交

[2.0/cherrypick] cherry-pick Sharding PR:29518 (#29593) · ab04bf01

由 JZ-LIANG 提交于 12月 16, 2020

* Sharding add hybrid-dp feature

* update sharding in distributed_strategy

* update sharding unitest

* revise code format for sharding

ab04bf01

07 12月, 2020 1 次提交
- C
  
  Use different name_scope for different conv type, test=develop (#29355) (#29410) · f223c786
  由 cc 提交于 12月 07, 2020
  
  f223c786
04 12月, 2020 2 次提交

[cherry-pick 2.0rc1][inplace] Add ShareHolderWith for class Variable and... · efb5ad62

由 liym27 提交于 12月 04, 2020

[cherry-pick 2.0rc1][inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267) (#29359)

efb5ad62

Support type promote for basic math ops (quantum required) (#29265) (#29354) · 0e7539e7

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

0e7539e7

03 12月, 2020 1 次提交
- S
  [cherry-pick]Change the api of DataParallel and Fleet (#29288) · ec57656e
  由 ShenLiang 提交于 12月 03, 2020
```
* Change the api of DataParallel and Fleet (#29224)
```
  ec57656e
02 12月, 2020 1 次提交
- C
  Hot fix complle failed in gcc4.8 caused by complex impl (#29254) (#29274) · 40bad648
  由 Chen Weihang 提交于 12月 02, 2020
```
* hot fix complle failed in gcc4.8

* fix failed unittest
```
  40bad648
01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

30 11月, 2020 1 次提交

Check whether there is any inplace operation affecting gradient calculation. (#27901) · 865a4598

由 liym27 提交于 11月 30, 2020

* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable.

* Add a new attribute `_inplace_version` for VarBase.

* Raise exception if an inplace operation can result in incorrect gradient computation.

* Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation.

* For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode.

* Use original var_wrapper if the inplace_version is not changed.

* Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.

865a4598

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致