提交 · ae75affd34b5308f8456bed4ecb94bba6321cb24 · Crayon鑫 / Paddle

15 1月, 2021 2 次提交

【Cherry-Pick】add distributed_infer (#30300) (#30427) · ae75affd

由 123malin 提交于 1月 15, 2021

* test=develop, add distributed_infer (#30300)

* test=develop, add distributed_infer

* test=develop, fix unittest cmakefile conflict

* test=develop, fix test_dist_fleet_base

ae75affd

W
Cherrypick fix rnn batch size diff (#30462) · e0e98627
由 wawltor 提交于 1月 15, 2021
```
* fix the rnn mask memory bug for out of read

* update the code for the rnn
```
e0e98627

14 1月, 2021 8 次提交
- S
  
  fix flatten api grad (#30426) (#30441) · 8b5307bf
  由 ShenLiang 提交于 1月 14, 2021
  
  8b5307bf
- L
  
  [cherry-pick] correct the allowed dimension size (#30326) (#30433) · 35c8eaf5
  由 lidanqing 提交于 1月 14, 2021
  
  35c8eaf5
- C
  
  skip quantizing ops in cpu inference (#30342) (#30405) · 2f16e0c6
  由 cc 提交于 1月 14, 2021
  
  2f16e0c6
- Q
  optimize memcpy perf for kunlun (#30291) (#30382) · 9de42be2
  由 QingshuChen 提交于 1月 14, 2021
```
* optimize memcpy perf for kunlun (#30291)

* optimize memcpy perf for kunlun

* remove useless unitest for kunlun mean

* minor

* fix bug that cann't find mkldnn(kunlun) (#30394)
```
  9de42be2
- L
  [cherrypick 2.0] add double grad for conv_transpose and depthwise_conv (#30429) · 1552343a
  由 LielinJiang 提交于 1月 14, 2021
```
* Add double grad for conv_transpose (#29706)

* add double grad for conv_transpose

* register cudnn conv double grad for depthwise conv (#29807)
```
  1552343a
- Z
  
  [cherry-pick 2.0]enable MakeCipher api for inference (#30389) · ac70275a
  由 Zhang Jun 提交于 1月 14, 2021
  
  ac70275a
- A
  
  Added support for inference using quantization aware trained dygraph (#30288) (#30402) · 38faed7f
  由 alncat 提交于 1月 14, 2021
  
  38faed7f
- G
  Softmax backward optimize (#30249) (#30400) · 4cc0337f
  由 GaoWei8 提交于 1月 14, 2021
```
* softmax backward optimize
```
  4cc0337f
13 1月, 2021 6 次提交

[cherry-pick] Set expected place in child thread for dataloader #30383 · 9fb5a3e5

由 Leo Chen 提交于 1月 13, 2021

* set expected place in child thread for dataloader

* set device id when set tensor from numpy

* revert tensor_py change

* add compile guard

* fix ci

* fix bug

9fb5a3e5

J

Recompute Offload (#30233) (#30372) · 3fbc3cf4
由 JZ-LIANG 提交于 1月 13, 2021

3fbc3cf4
S

Support unused parameters in dynamic graph distributed (#30224) (#30374) · 020e2431
由 ShenLiang 提交于 1月 13, 2021

020e2431
C
[Cherry-pick] Remove c++ stacktrace open hint #30341 · 428c884f
由 Chen Weihang 提交于 1月 13, 2021
```
[Cherry-pick] Remove c++ stacktrace open hint，cherry-pick of #30325
```
428c884f
T
split ps with distributed (#30337) · a97ca56a
由 tangwei12 提交于 1月 13, 2021
```
Change-Id: I3c788e7576688e63181e7f01562529b85a09cc59
```
a97ca56a

石

git cherry-pick the commits of operator version registries, test=release/2.0 (#30292) · 5eab1a38

由石晓伟提交于 1月 13, 2021

* Register op version for grid_sampler, test=op_version (#29916)

* add op version for fake_quant and fake_dequant ops, test=op_version (#29923)

* Register op version for print, test=op_version (#29945)

* add gru op_register_version; test=op_version; (#29931)

* Register op version for coalesce_tensor. (#29940)

* register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)

* add op_register_version for allclose op; test=op_version (#29968)

* register ModifyAttr for instance_norm, test=op_version (#29938)

* add op_version for flip op [test=op_version] (#30019)

* add the op version check for the elementwise ops, test=op_version (#30010)

* add the support the op version check for matmul, test=op_version (#30011)

* Revert "register ModifyAttr for instance_norm, test=op_version (#29938)"

* add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034)

* Fix rank_attention op_version, test=op_version (#30006)

* fix rank_attention, test=op_version

* Register op version for linspace,test=op_version (#30025)

* fix op_register_version for compare ops, test=op_version (#30007)
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>

* register ModifyAttr for instance_norm, test=op_version (#30065)

* register instance norm, test=op_version

* add trace op_register_version and fix version bug; test=op_version (#30000)

* fix a bug in op_version_registry, test=develop, test=op_version (#29994)

* Add version checking, test=op_version (#30129)

* fix a bug in gaussian_random_op version, test=release/2.0
Co-authored-by: NLielinJiang <50691816+LielinJiang@users.noreply.github.com>
Co-authored-by: Ncc <52520497+juncaipeng@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NGuo Sheng <whucsgs@163.com>
Co-authored-by: Nwangxinxin08 <69842442+wangxinxin08@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: NFlyingQianMM <245467267@qq.com>
Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Nchalsliu <45041955+chalsliu@users.noreply.github.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: NShenLiang <shenliang03@baidu.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: Nchannings <chenlingchi@baidu.com>
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>
Co-authored-by: Nruri <shipeng1108@163.com>

5eab1a38

12 1月, 2021 8 次提交

[cherry]Add callback after TensorCopy (#30123) (#30268) · 9d0a1eb4

由 Leo Chen 提交于 1月 12, 2021

* change to tensor copy sync

* change to tensor copy sync

* make copy_to safe when use TensorCopy

* refine code

* add ut

* add cudapinned garbagecollector

* add testcase: cpu place -> cuda pinned place

9d0a1eb4

【Cherry-Pick】Fix device_context & Save Tensor & Gloo (#30336) · 284bae99

由 Chengmo 提交于 1月 12, 2021

* Fix server.h include device_context (#30243)

* fix cmake
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* 【Paddle.Fleet】Support local save sparse param (#30175)

* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

284bae99

[2.0 Cherry-pick]fix 2.0 error message (#30332) · df67b317

由 swtkiwi 提交于 1月 12, 2021

* fix datanorm error msg (#30294)

* Optimize the error message of framework. (#30134)

* modify error message based on comments (#30189)

* modify error message based on comments

* edit code according to review.

* Correct spelling according to review.

* fix enforce msg of sum xpu op (#30113)

* enhance error info for py_func (#30138)

* enhance error info for py_func

* update

* fix elugradgrad test fail & error message opt (#30171)

* fix elugradgrad test fail and error message opt

* fix unitest,test=develop

* Update prroi_pool_op.h

fix error message

* opt message,test=develop

* fix ci fail,test=develop

* Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)

Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc

* enhance error message, test=develop (#30220)

* fix error message for distribute_fpn_proposals_op (#30116)

* enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)

* just add the op error message for the matmul xpu (#30246)

 add the op error message for the matmul xpu

* enhance error message of nll_loss op test=develop (#30125)

* enhance error message of nll_loss op test=develop
Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
Co-authored-by: Nxiemoyuan <71377852+xiemoyuan@users.noreply.github.com>
Co-authored-by: NWeiXin <weixin10@baidu.com>
Co-authored-by: NJack Zhou <zhoushunjie@baidu.com>
Co-authored-by: NWilber <jiweibo@baidu.com>
Co-authored-by: NDouble_V <liuvv0203@163.com>
Co-authored-by: NHuihuang Zheng <zhhsplendid@gmail.com>
Co-authored-by: Nzhang wenhui <frankwhzhang@126.com>
Co-authored-by: Nwangguanzhong <jerrywgz@126.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
Co-authored-by: Nlijianshe02 <48898730+lijianshe02@users.noreply.github.com>

df67b317

L
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199) #30286 · e7cbc43f
由 Leo Chen 提交于 1月 12, 2021
```
[cherry-pick] use cuda generator in bernoulli cuda kernel (#30199)
```
e7cbc43f
C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data... · 2db79f0a

由 Zhen Wang 提交于 1月 12, 2021

[Cherry-pick]Fix the accuracy problem of allclose op when using float64 data type in static mode.(#29890) (#30313)

* Fix the accuracy problem of allclose op when using float64 data type in static mode.

* Format the code style.

2db79f0a

[Cherry-pick] Complex grad for matmul, kron and type promotion (#30304) · 7346edc2

由 chentianyu03 提交于 1月 12, 2021

* complex gradient matmul  (#29966)

* dot op support complex types

* matmul support complex types

* add test case

* matmul broadcast gradient support complex

* move conjFunctor to complex_functor.h

* change the kron gradient when complex types (#29995)

* type promotion for grad (#30177)

* type promotion for grad

* add type promotion for div op

7346edc2

11 1月, 2021 14 次提交

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e

由 liym27 提交于 1月 11, 2021

[Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)

Cherry-Pick #30126
1. Support vector<float64> as type of op attribute.
2. op set_value suppports float64 numpy.array

d839761e

L
[cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
由 Leo Chen 提交于 1月 11, 2021
```
[cherry-pick] Async drop scope in executor (#29714)
```
93ce7f69
L
[Cherry-Pick 2.0] Check the rank of input in kernel of set_value op (#30147) (#30301) · a2bbd06a
由 liym27 提交于 1月 11, 2021
```
cherry-pick #30147，For op set_value, check input's rank < 7
```
a2bbd06a

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c

由 WeiXin 提交于 1月 11, 2021

[cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)

为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
原始PR：#30161

04cc659c

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65

由 pangyoki 提交于 1月 11, 2021

[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)

* add view strategy on squeeze,unsqueeze,reshape,flatten

* add squeeze unittest

* add unittests

* use View strategy as name rather than Reuse Allacation

* fix view api doc

* fix format

* use core.ops when input of reshape2 is Tensor

* fix test_cross_entropy_loss error because of reshape2

* delete selected_rows

* change op_function

* little change

* solve HandleViewBetweenInputAndOutput

7c943a65

Z
[cherry-pick]add cast cuda kernel (#29352) #30263 · afbc6367
由 Zhang Ting 提交于 1月 11, 2021
```
 add cast cuda kernel

cherry-pick #29352
```
afbc6367
W
[cherry-pick]add support for place string representation #30264 · fb66355e
由 wangchaochaohu 提交于 1月 11, 2021
```
cherry-pick #28769, add support for place string representation 
```
fb66355e

[cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8

由 wangchaochaohu 提交于 1月 11, 2021

* elementwise_add_grad Op optimization  (#29575)

* optimize for long width for elementwise (#29602)

* refine (#29622)

* delete the code for fp16 optimization because it is not faster than common template code (#29715)

* fix the shape choose of vectorize for cuda

* optimization for fp16 elementwise add (#29744)

* Fix the compiler error for half type (#29799)

* refine the compiler error for half2 operation (#29816)

* fix the compiler error when gcc4 cuda9.0 (#29997)

e59524f8

[Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54

由 Zhen Wang 提交于 1月 11, 2021

* Support pure fp16 training for AMP API. (#29544)

* add cast ops before and after unsupported fp16 ops.

* Keep partial net in FP32 pattern.

* Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.

* Add fp16 support for adam op.

* add multi precision attr for adam.

* Fix the bug of test_multi_precision_fp16_train UT.

* Code format for CI.

* Fix the redefine error about MPTypeTrait on windows.

* fix bugs of the _create_accumulators func in Momentum.

* fix bug when inserting post cast op.

* Add the update_loss_scaling op in allow_set of UnusedVarCheck.

* Update for ci coverage.

* Add some doc for OptimizerWithMixedPrecision.

* Fix the code style.

* Imporve the doc of `amp_init`.

* Change for fp16 testing if users have the infer program defined in separate way.

* Remove tensor copy in the update_loss_scaling op. (#29426)

* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.

d8dfef54

[Cherry pick] improve dropout (#30260) · b4931ab1

由 Zhang Ting 提交于 1月 11, 2021

* improve dropout (#29465)

* improve drop out

* add VectorizedRandomGeneratorWithGenerator

* fix bug

* modify according to comments

* improve dropout grad (#29605)

* improve grad perf

* fix the bug of dropout_grad (#29813)

b4931ab1

[cherry-pick] softmax optimize (#30279) · b80beb16

由 GaoWei8 提交于 1月 11, 2021

* Softmax vectorization (#29404)

* vec softmax fw

* vec softmax bw

* add a message argument for compiler compatibility

* optimize softmax forward (#30217)

* optimize softmax forward
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

b80beb16

W

Cherry-pick 30194 30164 30201(#30202) · 36de178a
由 Wilber 提交于 1月 11, 2021

36de178a

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

add aarch64 and sunway kunlun lib (#30027) (#30237) · eacbd488

由 QingshuChen 提交于 1月 11, 2021

* add aarch64 and sunway kunlun lib

* minor

* optimize elementwise_add for kunlun

* update kunlun dependence

* minor

* minor

eacbd488

08 1月, 2021 2 次提交

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive,... · 5fe3da39

由 liym27 提交于 1月 08, 2021

[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__  return wrong result(#30003) (#30146)

1. when slice_item is a slice:
 1) the start of __getitem__ should be std::max(start, 0) if slice
 2) the start of __getitem__ should be std::min(end, dim)
2. when slice_item is an integer, it should be in [-dim_len, dim_len)
3. Fix error message to use accurate data

5fe3da39

[Cherry-Pick 2.0][setitem] Support Tensor setitem in static mode (#29708) (#30104) · f46ddc0e

由 liym27 提交于 1月 08, 2021

1. Type of index: int, slice(step must be 1).

2. Type of value:
 (1) int32, int64, float32, bool;
 (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported>
 (3) paddle.Tensor(int32, int64, float32, float64, bool);

f46ddc0e

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致